Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Hi, I am using the stander highlighting. http://wiki.apache.org/solr/HighlightingParameters Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056240.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Usage of CloudSolrServer?
Hi Shawn; I am sorry but what kind of Load Balancing is that? I mean does it check whether some leaders are using much CPU or RAM etc.? I think a problem may occur at such kind of scenario: if some of leaders getting more documents than other leaders (I don't know how it is decided that into which shard a document will go) than there will be a bottleneck on that leader? 2013/4/15 Shawn Heisey s...@elyograg.org On 4/15/2013 8:05 AM, Furkan KAMACI wrote: My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different? It appears that the Solr integration in Nutch currently does not use CloudSolrServer. There is an issue to add it. The mutual dependency on HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses HttpClient 4. https://issues.apache.org/**jira/browse/NUTCH-1377https://issues.apache.org/jira/browse/NUTCH-1377 Until that is fixed, a load balancer would be required for full redundancy for updates with SolrCloud. You don't have to use a load balancer for it to work, but if the Solr server that Nutch is using goes down, then indexing will stop unless you reconfigure Nutch or bring the Solr server back up. Thanks, Shawn
Re: Empty Solr 4.2.1 can not create Collection
Hi, sorry for pushing, but I just replayed the steps with solr 4.0 where everything works fine. Then I switched to solr 4.2.1 and replayed the exact same steps and the collection won't start and no leader will be elected. Any clues ? Should I try it on the developer mailing list, maybe it's a bug ? Kind Regards Alexander Am 2013-04-10 22:27, schrieb A.Eibner: Hi, here the clusterstate.json (from zookeeper) after creating the core: {storage:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{app02:9985_solr_storage-core:{ shard:shard1, state:down, core:storage-core, collection:storage, node_name:app02:9985_solr, base_url:http://app02:9985/solr, router:compositeId}} cZxid = 0x10024 ctime = Wed Apr 10 22:18:13 CEST 2013 mZxid = 0x1003d mtime = Wed Apr 10 22:21:26 CEST 2013 pZxid = 0x10024 cversion = 0 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 467 numChildren = 0 But looking in the log files I found the following error (this also occures with the collection api) SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'storage_shard1_replica1': at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479) ... 19 more Caused by: java.lang.NullPointerException at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266) at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935) at org.apache.solr.cloud.ZkController.register(ZkController.java:761) at org.apache.solr.cloud.ZkController.register(ZkController.java:727) at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) ... 22 more Kind regards Alexander Am 2013-04-10 19:12, schrieb Joel Bernstein: Can you post what your clusterstate.json? After you spin up the initial core, it will automatically become leader for that shard. On Wed, Apr 10, 2013 at 3:43 AM, A.Eibner a_eib...@yahoo.de wrote: Hi Joel, I followed your steps, the cores and collection get created, but there is no leader elected so I can not query the collection... Do I miss something ? Kind Regards Alexander Am 2013-04-09 10:21, schrieb A.Eibner: Hi, thanks for your faster answer. You don't use the Collection API - may I ask you why ? Therefore you have to setup everything
Is cache useful for my scenario?
Hi, I am new in Solr and wish to use version 4.2.x for my app in production. I want to show hundreds and thousands of markers on a map with contents coming from Solr. As the user moves around the map and pans, the browser will fetch data/markers using a BBOX filter (based on the maps' viewport boundary). There will be a lot of data that will be indexed in Solr. My question is, does caching help in my case? As the filter queries will vary for almost all users ( because the viewport latitude/longitude would vary), in what ways can I use Caching to increase performance. Should I completely turn off caching? If you can suggest by your experience, it would be really nice. Thanks Sam -- View this message in context: http://lucene.472066.n3.nabble.com/Is-cache-useful-for-my-scenario-tp4056250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Usage of CloudSolrServer?
If you are accessing Solr from Java code, you will likely use the SolrJ client to do so. If your users are hitting Solr directly, you should think about whether this is wise - as well as providing them with direct search access, you are also providing them with the ability to delete your entire index with a single command. SolrJ isn't really a load balancer as such. When SolrJ is used to make a request against a collection, it will ask Zookeeper for the names of the shards that make up that collection, and for the hosts/cores that make up the set of replicas for those shards. It will then choose one of those hosts/cores for each shard, and send a request to them as a distributed search request. This has the advantage over traditional load balancing that if you bring up a new node, that node will register itself with ZooKeeper, and thus your SolrJ client(s) will know about it, without any intervention. Upayavira On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote: Hi Shawn; I am sorry but what kind of Load Balancing is that? I mean does it check whether some leaders are using much CPU or RAM etc.? I think a problem may occur at such kind of scenario: if some of leaders getting more documents than other leaders (I don't know how it is decided that into which shard a document will go) than there will be a bottleneck on that leader? 2013/4/15 Shawn Heisey s...@elyograg.org On 4/15/2013 8:05 AM, Furkan KAMACI wrote: My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different? It appears that the Solr integration in Nutch currently does not use CloudSolrServer. There is an issue to add it. The mutual dependency on HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses HttpClient 4. https://issues.apache.org/**jira/browse/NUTCH-1377https://issues.apache.org/jira/browse/NUTCH-1377 Until that is fixed, a load balancer would be required for full redundancy for updates with SolrCloud. You don't have to use a load balancer for it to work, but if the Solr server that Nutch is using goes down, then indexing will stop unless you reconfigure Nutch or bring the Solr server back up. Thanks, Shawn
first time with new keyword, solr take to much time to give the result
Hi, when we search with any new keyword at first time then solr 4.2.1 take to much time to give the result. we have 506 document is index in solr and it's size is 400GB. now when We search for keyword test it will take 1 min to give the response for 1 rows. we fire the query from the java application using solrj client. this behavior is same with solr 1.4, 3.5 and 4.2.1. all 400GB data is indexed in one folder called Solr Home\data\index. after fire the query, when we open the resource management then it will show that more cost is of Disk I/O any help would be helpfull to us Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html Sent from the Solr - User mailing list archive at Nabble.com.
SEVERE: shard update error StdNode on SolrCloud 4.2.1
Hi We have a simple SolrCloud setup (4.2.1) running with a single shard and two nodes, and it's working fine except whenever we send an update request, the leader logs this error: SEVERE: shard update error StdNode: http://10.20.10.42:8080/solr/ts/:org.apache.solr.common.SolrException: Server at http://10.20.10.42:8080/solr/ts returned non ok status:500, message:Internal Server Error at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) ... Which triggers a lot of to-ing and fro-ing between the leader and the replica, starting with this response on the replica to the above: INFO: [ts] webapp=/solr path=/update params={distrib.from= http://10.20.10.29:8080/solr/ts/update.distrib=FROMLEADERwt=javabinversion=2} {} 0 12 15-Apr-2013 16:38:23 org.apache.solr.common.SolrException log SEVERE: java.lang.UnsupportedOperationException at org.apache.lucene.queries.function.FunctionValues.longVal(FunctionValues.java:46) at org.apache.solr.update.VersionInfo.getVersionFromIndex(VersionInfo.java:201) at org.apache.solr.update.UpdateLog.lookupVersion(UpdateLog.java:714) at org.apache.solr.update.VersionInfo.lookupVersion(VersionInfo.java:184) ... At this point, the leader tells the replica to recover: INFO: try and ask http://10.20.10.42:8080/solr to recover Which it does: 15-Apr-2013 16:38:23 org.apache.solr.handler.admin.CoreAdminHandler handleRequestRecoveryAction but the attempt to use PeerSync fails: INFO: Attempting to PeerSync from http://10.20.10.29:8080/solr/ts/ core=ts - recoveringAfterStartup=false 15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync sync INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr START replicas=[ http://10.20.10.29:8080/solr/ts/] nUpdates=100 15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr Received 100 versions from 10.20.10.29:8080/solr/ts/ 15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr Our versions are too old. ourHighThreshold=1432379781917179904 otherLowThreshold=1432382177294680064 15-Apr-2013 16:38:26 org.apache.solr.update.PeerSync sync INFO: PeerSync: core=ts url=http://10.20.10.42:8080/solr DONE. sync failed 15-Apr-2013 16:38:26 org.apache.solr.cloud.RecoveryStrategy doRecovery INFO: PeerSync Recovery was not successful - trying replication. core=ts Replication then proceeds correctly, and the node is brought up to date. I'm guessing it's not supposed to work like this, but I'm having trouble finding anyone else with this problem, which makes me suspect we configured it wrong somewhere along the line, but I've checked it all per the documentation and I'm starting to run out of ideas. Any suggestions for where to look next would be most appreciated! Regards, Steve Woodcock
Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Could be a bug in the higlighter. But before claiming that, I would still play around different options, like hl.fragSize, hl.highlightMultiTerm. Also, have you considered storing synonyms in the index? On Tue, Apr 16, 2013 at 9:42 AM, juancesarvillalba juancesarvilla...@gmail.com wrote: Hi, I am using the stander highlighting. http://wiki.apache.org/solr/HighlightingParameters Cheers -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056240.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Hi, Things to google ;) 1. warmup queries 2. solr cache How much RAM does you index take now? Dmitry On Tue, Apr 16, 2013 at 1:22 PM, Montu v Boda montu.b...@highqsolutions.com wrote: Hi, when we search with any new keyword at first time then solr 4.2.1 take to much time to give the result. we have 506 document is index in solr and it's size is 400GB. now when We search for keyword test it will take 1 min to give the response for 1 rows. we fire the query from the java application using solrj client. this behavior is same with solr 1.4, 3.5 and 4.2.1. all 400GB data is indexed in one folder called Solr Home\data\index. after fire the query, when we open the resource management then it will show that more cost is of Disk I/O any help would be helpfull to us Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Hi currently, my solr is deploy in tomcat1 and we have given 4GB memory of that tomcat Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing Solr Index on NFS
Hi Walter; You said: It is not safe to share Solr index files between two Solr servers. Why do you think like that? 2013/4/16 Tim Vaillancourt t...@elementspace.com If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.** org wun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Usage of CloudSolrServer?
Thanks for your detailed explanation. However you said: It will then choose one of those hosts/cores for each shard, and send a request to them as a distributed search request. Is there any document that explains of distributed search? What is the criteria for it? 2013/4/16 Upayavira u...@odoko.co.uk If you are accessing Solr from Java code, you will likely use the SolrJ client to do so. If your users are hitting Solr directly, you should think about whether this is wise - as well as providing them with direct search access, you are also providing them with the ability to delete your entire index with a single command. SolrJ isn't really a load balancer as such. When SolrJ is used to make a request against a collection, it will ask Zookeeper for the names of the shards that make up that collection, and for the hosts/cores that make up the set of replicas for those shards. It will then choose one of those hosts/cores for each shard, and send a request to them as a distributed search request. This has the advantage over traditional load balancing that if you bring up a new node, that node will register itself with ZooKeeper, and thus your SolrJ client(s) will know about it, without any intervention. Upayavira On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote: Hi Shawn; I am sorry but what kind of Load Balancing is that? I mean does it check whether some leaders are using much CPU or RAM etc.? I think a problem may occur at such kind of scenario: if some of leaders getting more documents than other leaders (I don't know how it is decided that into which shard a document will go) than there will be a bottleneck on that leader? 2013/4/15 Shawn Heisey s...@elyograg.org On 4/15/2013 8:05 AM, Furkan KAMACI wrote: My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different? It appears that the Solr integration in Nutch currently does not use CloudSolrServer. There is an issue to add it. The mutual dependency on HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses HttpClient 4. https://issues.apache.org/**jira/browse/NUTCH-1377 https://issues.apache.org/jira/browse/NUTCH-1377 Until that is fixed, a load balancer would be required for full redundancy for updates with SolrCloud. You don't have to use a load balancer for it to work, but if the Solr server that Nutch is using goes down, then indexing will stop unless you reconfigure Nutch or bring the Solr server back up. Thanks, Shawn
Re: Storing Solr Index on NFS
Furkan, see this post. http://grokbase.com/t/lucene/solr-user/117t1eswyk/multiple-solr-servers-and-a-shared-index-again Cumprimentos -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 16, 2013 at 12:15 PM, Furkan KAMACI wrote: Hi Walter; You said: It is not safe to share Solr index files between two Solr servers. Why do you think like that? 2013/4/16 Tim Vaillancourt t...@elementspace.com (mailto:t...@elementspace.com) If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.** org wun...@wunderwood.org (mailto:wun...@wunderwood.org)wrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org (mailto:wun...@wunderwood.org) -- Walter Underwood wun...@wunderwood.org (mailto:wun...@wunderwood.org)
Solr 4.2.1 sorting by distance to polygon centre.
Hi, I got everything in place, my polygons are indexing properly, I played a bit with LSP which helped me a lot, now, I have JTS 1.13 inside solr.war; here is my challenge: I have big polygon (A) which contains smaller polygons (B and C), B and C have some intersection, so if I search for a coordinate inside the 3, I would like to sort by the distance to the centre of the polygons that match the criteria. As example, let's say dot B is on the centre of B, dot C is at the centre of C and dot A is at the intersection of B and C which happens to be the centre of A, so for dot A should be polygon A first and so on. I could compute with the distances using the result but since Solr is doing a heavy load already, why not just include the sort in it. Here is my field type definition: !-- Spatial field type -- fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory units=degrees/ Field definition: !-- JTS spatial polygon field -- field name=geopolygon type=location_rpt indexed=true stored=false required=false multiValued=true/ I'm using the Solr admin UI first to shape my query and then moving to our web app which uses solrj, here is the XML form of my result which includes the query I'm making, which scores all distances to 1.0 (Not what I want): |?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime9/int lst name=params str name=flid,score/str str name=sortscore asc/str str name=indenttrue/str str name=q*:*/str str name=_136620720/str str name=wtxml/str str name=fq{!score=distance}geopolygon:Intersects(-6.271906 53.379284)/str /lst /lst result name=response numFound=3 start=0 maxScore=1.0 doc str name=iduid13972/str float name=score1.0/float/doc doc str name=iduid13979/str float name=score1.0/float/doc doc str name=iduid13974/str float name=score1.0/float/doc /result /response| Thanks for all responses, Guido.
Re: first time with new keyword, solr take to much time to give the result
On Tue, 2013-04-16 at 12:22 +0200, Montu v Boda wrote: we have 506 document is index in solr and it's size is 400GB. now when We search for keyword test it will take 1 min to give the response for 1 rows. At this point, you have searched for other keywords before you measure on keyword test, right? The first search on a newly opened index is notoriously slow. after fire the query, when we open the resource management then it will show that more cost is of Disk I/O Both searching and value retrieval (for the 10K rows) requires a lot of random access in Lucene/Solr and, I guess, just about every other comparable search engines. I will bet a cake that your underlying storage is spinning disks. When you perform a search for a keyword that has not been used before or not in a while, the disk cache has little data for that search so there will be a lot of random access to the underlying storage. Spinning disks are really bad at this. any help would be helpfull to us Short answer: Use a SSD. Longer answer: You need to either lower the amount of seeks or make them faster (or both). You lower the amount of seeks by (in your case) copious amounts of RAM and a lot of warming of your searchers. You make the seeks faster by switching storage type. RAIDing of spinning drives does not help much as the benefits of this are higher bulk transfer rates and/or concurrent requests, where you need lower latency. You could buy faster spinning drives, but with current prices of SSDs I would really advice that you choose that road instead. Regards, Toke Eskildsen, State and University Library, Denmark
Re: first time with new keyword, solr take to much time to give the result
In the admin page you can monitor the cache parameters, like eviction. If you cache evicts too much, you can increase its capacity. NOTE: this will affect on RAM consumption, so you would need to change the tomcat config too. On Tue, Apr 16, 2013 at 2:08 PM, Montu v Boda montu.b...@highqsolutions.com wrote: Hi currently, my solr is deploy in tomcat1 and we have given 4GB memory of that tomcat Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056261.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some Questions About Using Solr as Cloud
Yes. Every node is really self-contained. When you send a doc to a cluster where each shard has a replica, the raw doc is sent to each node of that shard and indexed independently. About old docs, it's the same as Solr 3.6. Data associated with docs stays around in the index until it's merged away. You cannot transfer just the indexed form of a document from one core to another, you have to re-index the doc. Best Erick On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Jack; I see that SolrCloud makes everything automated. When I use SolrCloud is it true that: there may be more than one computer responsible for indexing at any time? 2013/4/15 Jack Krupansky j...@basetechnology.com There are no masters or slaves in SolrCloud - it's fully distributed. Some cluster nodes will be leaders (of the shard on that node) at a given point in time, but different nodes may be leaders at different points in time as they become elected. In a distributed cluster you would never want to store documents only on one node. Sure, you can do that by setting the replication factor to 1, but that defeats half the purpose for SolrCloud. Index transfer is automatic - SolrCloud supports fully distributed update. You might be getting confused with the old Master-Slave-Replication model that Solr had (and still has) which is distinct from SolrCloud. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Sunday, April 14, 2013 7:45 PM To: solr-user@lucene.apache.org Subject: Some Questions About Using Solr as Cloud I read wiki and reading SolrGuide of Lucidworks. However I want to clear something in my mind. Here are my questions: 1) Does SolrCloud lets a multi master design (is there any document that I can read about it)? 2) Let's assume that I use multiple cores i.e. core A and core B. Let's assume that there is a document just indexed at core B. If I send a search request to core A can I get result? 3) When I use multi master design (if exists) can I transfer one master's index data into another (with its slaves or not)? 4) When I use multi core design can I transfer one index data into another core or anywhere else? By the way thanks for the quick responses and kindness at mail list.
Re: Some Questions About Using Solr as Cloud
Hi Erick; Thanks for the explanation. You said: You cannot transfer just the indexed form of a document from one core to another, you have to re-index the doc. why do you think like that? 2013/4/16 Erick Erickson erickerick...@gmail.com Yes. Every node is really self-contained. When you send a doc to a cluster where each shard has a replica, the raw doc is sent to each node of that shard and indexed independently. About old docs, it's the same as Solr 3.6. Data associated with docs stays around in the index until it's merged away. You cannot transfer just the indexed form of a document from one core to another, you have to re-index the doc. Best Erick On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Jack; I see that SolrCloud makes everything automated. When I use SolrCloud is it true that: there may be more than one computer responsible for indexing at any time? 2013/4/15 Jack Krupansky j...@basetechnology.com There are no masters or slaves in SolrCloud - it's fully distributed. Some cluster nodes will be leaders (of the shard on that node) at a given point in time, but different nodes may be leaders at different points in time as they become elected. In a distributed cluster you would never want to store documents only on one node. Sure, you can do that by setting the replication factor to 1, but that defeats half the purpose for SolrCloud. Index transfer is automatic - SolrCloud supports fully distributed update. You might be getting confused with the old Master-Slave-Replication model that Solr had (and still has) which is distinct from SolrCloud. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Sunday, April 14, 2013 7:45 PM To: solr-user@lucene.apache.org Subject: Some Questions About Using Solr as Cloud I read wiki and reading SolrGuide of Lucidworks. However I want to clear something in my mind. Here are my questions: 1) Does SolrCloud lets a multi master design (is there any document that I can read about it)? 2) Let's assume that I use multiple cores i.e. core A and core B. Let's assume that there is a document just indexed at core B. If I send a search request to core A can I get result? 3) When I use multi master design (if exists) can I transfer one master's index data into another (with its slaves or not)? 4) When I use multi core design can I transfer one index data into another core or anywhere else? By the way thanks for the quick responses and kindness at mail list.
Re: SolrException parsing error
Did you find anything? I have the same problem but it's on update requests only. The error comes from the solrj client indeed. It is solrj logging this error. There is nothing in solr itself and it does the update correctly. It's fairly small simple documents being updated. On 04/15/2013 07:49 PM, Shawn Heisey wrote: On 4/15/2013 9:47 AM, Luis Lebolo wrote: Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis -- SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.FastInputStream.readByte(FastInputStream.java:193) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:107) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:387) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301) I am guessing that this log is coming from your SolrJ client, but That is not completely clear, so is it SolrJ or Solr that is logging this error? If it's SolrJ, do you see anything in the Solr log, and vice versa? This looks to me like a network problem, where something is dropping the connection before transfer is complete. It could be an unusual server-side config, OS problems, timeout settings in the SolrJ code, NIC drivers/firmware, bad cables, bad network hardware, etc. Thanks, Shawn
Re: first time with new keyword, solr take to much time to give the result
Hi Thanks for info. we did the same thing but no effect for first time. what to do for first time query with new keyword? how we can make the query faster for first time with new keyword? say for ex if i try to search the text key word test first time then it will take to much time to execute. for second time the same keyword works faster... Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Parser OR AND and NOT
The query language is NOT pure boolean. Hoss wrote this up: http://searchhub.org/2011/12/28/why-not-and-or-and-not/ Best Erick On Mon, Apr 15, 2013 at 12:54 PM, Roman Chyla roman.ch...@gmail.com wrote: Oh, sorry, I have assumed lucene query parser. I think SOLR qp must be different then, because for me it works as expected (our qp parser is identical with lucene in the way it treats modifiers +/- and operators AND/OR/NOT -- NOT must be joining two clauses: a NOT b, the first cannot be negative, as Chris points out; the modifier however can be first - but it cannot be alone, there must be at least one positive clause). Otherwise, -field:x it is changed into field:x http://labs.adsabs.harvard.edu/adsabs/search/?q=%28*+-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE http://labs.adsabs.harvard.edu/adsabs/search/?q=%28-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE roman On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, Roman Chyla roman.ch...@gmail.com wrote in news:caen8dywjrl+e3b0hpc9ntlmjtrkasrqlvkzhkqxopmlhhfn...@mail.gmail.com: should be: -city:H* OR zip:30* -city:H* OR zip:30* numFound:2520 gives the same wrong result. Another Idea? Ciao Peter Schütt
how to display groups along with matching terms in solr auto-suggestion?
Hi, I have used Terms for auto-suggestion. But it just list the terms that matches terms.prefix from index , along with these term suggestions, I have to display the product groups that matches with the input prefix. Is it possible in solr auto-suggest? Somebody could please help me on this issue?
SolrCloud Leader Response Mechanism
When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Function Query performance in combination with filters
Hi, I am using pretty complex function queries to completely customize (not only boost) the score of my result documents that are retrieved from an index of approx 10e7 documents. To get to an acceptable level of performance I combine my query with filters in the following way (very short example): q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2` Although I always have (because of the filter) approx 50.000 docs in the result set, the search times vary (depending on the actual query) between 100ms and 6000ms. My understanding was that the scoring function is only applied to the result set from the filters. But based on what I am seeing it seems that a lot more documents are actually put through the _val_ function. Is there a way to fully compute the score of only the documents in the result set? Thanks, Nico -- View this message in context: http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Usage of CloudSolrServer?
I cannot say that I have researched it, but I have always taken it to be random. Upayavira On Tue, Apr 16, 2013, at 12:23 PM, Furkan KAMACI wrote: Thanks for your detailed explanation. However you said: It will then choose one of those hosts/cores for each shard, and send a request to them as a distributed search request. Is there any document that explains of distributed search? What is the criteria for it? 2013/4/16 Upayavira u...@odoko.co.uk If you are accessing Solr from Java code, you will likely use the SolrJ client to do so. If your users are hitting Solr directly, you should think about whether this is wise - as well as providing them with direct search access, you are also providing them with the ability to delete your entire index with a single command. SolrJ isn't really a load balancer as such. When SolrJ is used to make a request against a collection, it will ask Zookeeper for the names of the shards that make up that collection, and for the hosts/cores that make up the set of replicas for those shards. It will then choose one of those hosts/cores for each shard, and send a request to them as a distributed search request. This has the advantage over traditional load balancing that if you bring up a new node, that node will register itself with ZooKeeper, and thus your SolrJ client(s) will know about it, without any intervention. Upayavira On Tue, Apr 16, 2013, at 08:36 AM, Furkan KAMACI wrote: Hi Shawn; I am sorry but what kind of Load Balancing is that? I mean does it check whether some leaders are using much CPU or RAM etc.? I think a problem may occur at such kind of scenario: if some of leaders getting more documents than other leaders (I don't know how it is decided that into which shard a document will go) than there will be a bottleneck on that leader? 2013/4/15 Shawn Heisey s...@elyograg.org On 4/15/2013 8:05 AM, Furkan KAMACI wrote: My system is as follows: I crawl data with Nutch and send them into SolrCloud. Users will search at Solr. What is that CloudSolrServer, should I use it for load balancing or is it something else different? It appears that the Solr integration in Nutch currently does not use CloudSolrServer. There is an issue to add it. The mutual dependency on HttpClient is holding it up - Nutch uses HttpClient 3, SolrJ 4.x uses HttpClient 4. https://issues.apache.org/**jira/browse/NUTCH-1377 https://issues.apache.org/jira/browse/NUTCH-1377 Until that is fixed, a load balancer would be required for full redundancy for updates with SolrCloud. You don't have to use a load balancer for it to work, but if the Solr server that Nutch is using goes down, then indexing will stop unless you reconfigure Nutch or bring the Solr server back up. Thanks, Shawn
terms starting with multilingual character don't list on solr auto-suggestion list
Hi, I have used /terms for solr auto-suggestion list. It works fine for English words. But I have problem on multi-language index words, I have tested for Russian language. If there is Russian charcter in between the word, then it gets displayed on suggesstion list like if I type 'кар', it list карабином , but if the russian character is the first/initial character like Фляга and I start type Фля, it does not list the word starting with this prefix (here Фляга). somebody could please help me on this issue?
Re: SolrException parsing error
Turns out I spoke too soon. I was *not* sending the query via POST. Changing the method to POST solved the issue for me (maybe I was hitting a GET limit somewhere?). -Luis On Tue, Apr 16, 2013 at 7:38 AM, Marc des Garets m...@ttux.net wrote: Did you find anything? I have the same problem but it's on update requests only. The error comes from the solrj client indeed. It is solrj logging this error. There is nothing in solr itself and it does the update correctly. It's fairly small simple documents being updated. On 04/15/2013 07:49 PM, Shawn Heisey wrote: On 4/15/2013 9:47 AM, Luis Lebolo wrote: Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.** SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis --** SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.**SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.**FastInputStream.readByte(**FastInputStream.java:193) at org.apache.solr.common.util.**JavaBinCodec.unmarshal(** JavaBinCodec.java:107) at org.apache.solr.client.solrj.**impl.BinaryResponseParser.** processResponse(**BinaryResponseParser.java:41) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:387) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181) at org.apache.solr.client.solrj.**request.QueryRequest.process(**QueryRequest.java:90) at org.apache.solr.client.solrj.**SolrServer.query(SolrServer.** java:301) I am guessing that this log is coming from your SolrJ client, but That is not completely clear, so is it SolrJ or Solr that is logging this error? If it's SolrJ, do you see anything in the Solr log, and vice versa? This looks to me like a network problem, where something is dropping the connection before transfer is complete. It could be an unusual server-side config, OS problems, timeout settings in the SolrJ code, NIC drivers/firmware, bad cables, bad network hardware, etc. Thanks, Shawn
Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start
Hi, We are migrating to Solr 4.2 from Solr 3.6 and Solr 4.2 is throwing Exception on Restart. What is More, it take a hell lot of Time ( More than one hour to get Up and Running) THE exception After Restart ... = Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates update WARNING: Unexpected log entry or corrupt log. Entry=11 java.lang.ClassCastException: java.lang.Long cannot be cast to java.util.List at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929) at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863) at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014) at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253) at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525) at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596) at org.apache.solr.core.SolrCore.init(SolrCore.java:806) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Apr 16, 2013 4:47:31 PM org.apache.solr.update.UpdateLog$RecentUpdates update WARNING: Unexpected log entry or corrupt log. Entry=8120?785879438123 java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:929) at org.apache.solr.update.UpdateLog$RecentUpdates.access$000(UpdateLog.java:863) at org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1014) at org.apache.solr.update.UpdateLog.init(UpdateLog.java:253) at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:82) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:137) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95) = And Once Restarted, I start getting replication errors Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost:25280/solr/accessories is not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:25280/solr/accessories Apr 16, 2013 5:20:30 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost:25280/solr/newQueries is not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:25280/solr/newQueries Apr 16, 2013 5:21:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost:25280/solr/phcare is not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:25280/solr/phcare Apr 16, 2013 5:21:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost:25280/solr/audioplayersCore is not available. Index fetch failed. Exception: org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: http://localhost:25280/solr/audioplayersCore Apr 16, 2013 5:21:24 PM
Re: first time with new keyword, solr take to much time to give the result
Are you actually trying to return 10,000 records, or is that the number of hits, and you're only retrieving the top 10? Cheers, Duncan. On 16 April 2013 12:39, Montu v Boda montu.b...@highqsolutions.com wrote: Hi Thanks for info. we did the same thing but no effect for first time. what to do for first time query with new keyword? how we can make the query faster for first time with new keyword? say for ex if i try to search the text key word test first time then it will take to much time to execute. for second time the same keyword works faster... Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056276.html Sent from the Solr - User mailing list archive at Nabble.com. -- Don't let your mind wander -- it's too little to be let out alone.
Re: Function Query performance in combination with filters
On Tue, Apr 16, 2013 at 7:51 AM, Rogalon nico.beche...@me.com wrote: Hi, I am using pretty complex function queries to completely customize (not only boost) the score of my result documents that are retrieved from an index of approx 10e7 documents. To get to an acceptable level of performance I combine my query with filters in the following way (very short example): q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2` Although I always have (because of the filter) approx 50.000 docs in the result set, the search times vary (depending on the actual query) between 100ms and 6000ms. My understanding was that the scoring function is only applied to the result set from the filters. That should be the case. But based on what I am seeing it seems that a lot more documents are actually put through the _val_ function. How did you verify this? -Yonik http://lucidworks.com
Re: terms starting with multilingual character don't list on solr auto-suggestion list
Can you share your auto-complete/suggestor configuration parameters? Including the search component. It sounds as if there is a field type with an analyzer that is mapping characters. -- Jack Krupansky -Original Message- From: sharmila thapa Sent: Tuesday, April 16, 2013 7:54 AM To: solr-user@lucene.apache.org Subject: terms starting with multilingual character don't list on solr auto-suggestion list Hi, I have used /terms for solr auto-suggestion list. It works fine for English words. But I have problem on multi-language index words, I have tested for Russian language. If there is Russian charcter in between the word, then it gets displayed on suggesstion list like if I type 'кар', it list карабином , but if the russian character is the first/initial character like Фляга and I start type Фля, it does not list the word starting with this prefix (here Фляга). somebody could please help me on this issue?
Re: SolR InvalidTokenOffsetsException with Highlighter and Synonyms
Hi, At moment, I am not considering store synonyms in the index, although is something that I have to do some time. Is strange that something common like multi-word synonyms have a bug with highligting but I couldn't find any solution. Thanks for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/SolR-InvalidTokenOffsetsException-with-Highlighter-and-Synonyms-tp4053988p4056305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
hi we are trying to return 10,000 rows it is necessary to return 1 rows because from that 1, we are pick only top 100 record based on the user permission and permission is stored in database not on solr. and if we try to return 100 rows then it may possible that from the 100 rows, user does not have permission of any document. user will get blank search result. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Hi Montu, Regarding permissions, you may find this solution more elegant: http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html --- On Tue, 4/16/13, Montu v Boda montu.b...@highqsolutions.com wrote: From: Montu v Boda montu.b...@highqsolutions.com Subject: Re: first time with new keyword, solr take to much time to give the result To: solr-user@lucene.apache.org Date: Tuesday, April 16, 2013, 4:13 PM hi we are trying to return 10,000 rows it is necessary to return 1 rows because from that 1, we are pick only top 100 record based on the user permission and permission is stored in database not on solr. and if we try to return 100 rows then it may possible that from the 100 rows, user does not have permission of any document. user will get blank search result. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
On Tue, Apr 16, 2013 at 3:13 PM, Montu v Boda montu.b...@highqsolutions.com wrote: hi we are trying to return 10,000 rows it is necessary to return 1 rows because from that 1, we are pick only top 100 record based on the user permission and permission is stored in database not on solr. and if we try to return 100 rows then it may possible that from the 100 rows, user does not have permission of any document. user will get blank search result. You may have some other options: 1) Add the access rights to SOLR, and have a front-end that takes a user id and expands it into a set of access rights (groups, mainly) for the user. This is then added as a filter to the queries. 2) Run the query with a smaller number of hits requested, and use the start parameter to fetch more hits (if necessary). Also, you may want to restrict the fields returned by your query, to the minimal set required.
Re: Function Query performance in combination with filters
Am 16. April 2013 um 14:46 schrieb Yonik Seeley-4 [via Lucene] ml-node+s472066n4056299...@n3.nabble.com: On Tue, Apr 16, 2013 at 7:51 AM, Rogalon [hidden email] wrote: Hi, I am using pretty complex function queries to completely customize (not only boost) the score of my result documents that are retrieved from an index of approx 10e7 documents. To get to an acceptable level of performance I combine my query with filters in the following way (very short example): q=_val_:sum(termfreq(fieldname,`word`),termfreq(fieldname2,`word2`))fq=fieldname:`word`fq=fieldname2:`word2` Although I always have (because of the filter) approx 50.000 docs in the result set, the search times vary (depending on the actual query) between 100ms and 6000ms. My understanding was that the scoring function is only applied to the result set from the filters. That should be the case. But based on what I am seeing it seems that a lot more documents are actually put through the _val_ function. How did you verify this? Thanks for taking a look at my problem. For now - I verified just by taking a look at the query times and doing some simple experiments. If I am not using the function query at all (q=*:*fq=...), the approx. 50.000 results from the filters are always returned within 200-300ms. This is pretty stable. If I have a (test) index of 50.000 documents (instead of the the 10e7 index) only and I pass every document through the _val_ query (without any filters), this takes about 150ms which in my case would be ok. Applying no filters to the function query on the 10e7 index leads to search times at about 6000ms which is too much. But if I use the filters as stated above I get returned 50.000 documents but the query times suddenly start to vary between 100ms and 6000ms. Some of my filters might actually be on stop words which appear in every other document in the index but that seems to really hurt performance only if the function query is used. Greetings, Nico -Yonik http://lucidworks.com If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056299.html To unsubscribe from Function Query performance in combination with filters, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/Function-Query-performance-in-combination-with-filters-tp4056283p4056312.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Why not just add a filter query for user permissions? -- Jack Krupansky -Original Message- From: Montu v Boda Sent: Tuesday, April 16, 2013 9:13 AM To: solr-user@lucene.apache.org Subject: Re: first time with new keyword, solr take to much time to give the result hi we are trying to return 10,000 rows it is necessary to return 1 rows because from that 1, we are pick only top 100 record based on the user permission and permission is stored in database not on solr. and if we try to return 100 rows then it may possible that from the 100 rows, user does not have permission of any document. user will get blank search result. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrException parsing error
Problem solved for me as well. The client is running in tomcat and the connector had compression=true. I removed it and now it seems to work fine. On 04/16/2013 02:28 PM, Luis Lebolo wrote: Turns out I spoke too soon. I was *not* sending the query via POST. Changing the method to POST solved the issue for me (maybe I was hitting a GET limit somewhere?). -Luis On Tue, Apr 16, 2013 at 7:38 AM, Marc des Garets m...@ttux.net wrote: Did you find anything? I have the same problem but it's on update requests only. The error comes from the solrj client indeed. It is solrj logging this error. There is nothing in solr itself and it does the update correctly. It's fairly small simple documents being updated. On 04/15/2013 07:49 PM, Shawn Heisey wrote: On 4/15/2013 9:47 AM, Luis Lebolo wrote: Hi All, I'm using Solr 4.1 and am receiving an org.apache.solr.common.** SolrException parsing error with root cause java.io.EOFException (see below for stack trace). The query I'm performing is long/complex and I wonder if its size is causing the issue? I am querying via POST through SolrJ. The query (fq) itself is ~20,000 characters long in the form of: fq=(mutation_prot_mt_1_1:2374 + OR + mutation_prot_mt_2_1:2374 + OR + mutation_prot_mt_3_1:2374 + ...) + OR + (mutation_prot_mt_1_2:2374 + OR + mutation_prot_mt_2_2:2374 + OR + mutation_prot_mt_3_2:2374+...) + OR + ... In short, I am querying for an ID throughout multiple dynamically created fields (mutation_prot_mt_#_#). Any thoughts on how to further debug? Thanks in advance, Luis --** SEVERE: Servlet.service() for servlet [X] in context with path [/x] threw exception [Request processing failed; nested exception is org.apache.solr.common.**SolrException: parsing error] with root cause java.io.EOFException at org.apache.solr.common.util.**FastInputStream.readByte(**FastInputStream.java:193) at org.apache.solr.common.util.**JavaBinCodec.unmarshal(** JavaBinCodec.java:107) at org.apache.solr.client.solrj.**impl.BinaryResponseParser.** processResponse(**BinaryResponseParser.java:41) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:387) at org.apache.solr.client.solrj.**impl.HttpSolrServer.request(**HttpSolrServer.java:181) at org.apache.solr.client.solrj.**request.QueryRequest.process(**QueryRequest.java:90) at org.apache.solr.client.solrj.**SolrServer.query(SolrServer.** java:301) I am guessing that this log is coming from your SolrJ client, but That is not completely clear, so is it SolrJ or Solr that is logging this error? If it's SolrJ, do you see anything in the Solr log, and vice versa? This looks to me like a network problem, where something is dropping the connection before transfer is complete. It could be an unusual server-side config, OS problems, timeout settings in the SolrJ code, NIC drivers/firmware, bad cables, bad network hardware, etc. Thanks, Shawn This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com Ltd or any of its subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf 192.com Ltd with another party by email without express written confirmation by an authorised employee of the company. http://www.192.com (Tel: 08000 192 192). 192.com Ltd is incorporated in England and Wales, company number 07180348, VAT No. GB 103226273.
Same Shards at Different Machines
Is it possible to use same shards at different machines at SolrCloud?
Re: first time with new keyword, solr take to much time to give the result
Hi problem is that the permission is frequently update in our system so that we have to update the index in the same manner other wise it will give wrong result. in that case i think the cache will get effect and the performance may be reduced. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056321.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Hi problem is that the permission is frequently update in our system so that we have to update the index in the same manner other wise it will give wrong result. in that case i think the cache will get effect and the performance may be reduced. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 sorting by distance to polygon centre.
Guido, The field type solr.SpatialRecursivePrefixTreeFieldType can only participate in distance reporting for indexed points, not other shapes. In fact, I recommend not attempting to get the distance if the field isn't purely indexed points, as it may get confused if it seems some small shapes. For your use-case, you should index an additional solr.SpatialRecursivePrefixTreeFieldType field just for the points. You could do this external to Solr, or you could write a Solr UpdateRequestProcessor that parses the shape in order to then call getCenter(), and put those points in the other field. ~ David On 4/16/13 7:23 AM, Guido Medina guido.med...@temetra.com wrote: Hi, I got everything in place, my polygons are indexing properly, I played a bit with LSP which helped me a lot, now, I have JTS 1.13 inside solr.war; here is my challenge: I have big polygon (A) which contains smaller polygons (B and C), B and C have some intersection, so if I search for a coordinate inside the 3, I would like to sort by the distance to the centre of the polygons that match the criteria. As example, let's say dot B is on the centre of B, dot C is at the centre of C and dot A is at the intersection of B and C which happens to be the centre of A, so for dot A should be polygon A first and so on. I could compute with the distances using the result but since Solr is doing a heavy load already, why not just include the sort in it. Here is my field type definition: !-- Spatial field type -- fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFac tory units=degrees/ Field definition: !-- JTS spatial polygon field -- field name=geopolygon type=location_rpt indexed=true stored=false required=false multiValued=true/ I'm using the Solr admin UI first to shape my query and then moving to our web app which uses solrj, here is the XML form of my result which includes the query I'm making, which scores all distances to 1.0 (Not what I want): |?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime9/int lst name=params str name=flid,score/str str name=sortscore asc/str str name=indenttrue/str str name=q*:*/str str name=_136620720/str str name=wtxml/str str name=fq{!score=distance}geopolygon:Intersects(-6.271906 53.379284)/str /lst /lst result name=response numFound=3 start=0 maxScore=1.0 doc str name=iduid13972/str float name=score1.0/float/doc doc str name=iduid13979/str float name=score1.0/float/doc doc str name=iduid13974/str float name=score1.0/float/doc /result /response| Thanks for all responses, Guido.
Re: using maven to deploy solr on tomcat
the problem is i need to deploy it on servers where i dont know what the absolute path will be .. basically my goal is to load solr with a different set of configuration files based on the environment its in. Is there a a better different way to do this On Mon, Apr 15, 2013 at 11:29 PM, Shawn Heisey s...@elyograg.org wrote: On 4/15/2013 2:33 PM, Adeel Qureshi wrote: Environment name=solr/home override=true type=java.lang.String value=src/main/resources/solr-dev/ but this leads to absolute path of INFO: Using JNDI solr.home: src/main/resources/solr-dev INFO: looking for solr.xml: C:\springsource\sts-2.8.1.RELEASE\src\main\resources\solr-dev\solr.xml If you use a relative path for the solr home as you have done, it will be relative to the current working directory. The CWD can vary depending on how tomcat gets started. In your case, the CWD seems to be C:\springsource\sts-2.8.1.RELEASE. If you change the CWD in the tomcat startup, you will probably have to set the TOMCAT_HOME environment variable for tomcat to start correctly, so I don't recommend doing that. It is usually best to choose an absolute path for the solr home. Solr will find solr.xml there, which it will use to find the rest of your config(s). All paths in solr.xml and other solr config files can be relative. What you are seeing as an absolute path is likely the current working directory plus your solr home setting. Thanks, Shawn
Re: Solr 4.2.1 sorting by distance to polygon centre.
David, I have been following your stackoverflow posts, I understand what you say, we decided to change the criteria and index an extra field (close to your suggestion), so the sorting will happen now by polygon area desc (Which induced another problem, calculation of polygon area on a sphere), finally I got to the point of testing, also due to what you are saying, is not a good idea to overload more than just the bare use of points (Intersects) inside polygon to get the the list that matches specific criteria. To resume, calculate the area of the polygon, again, for curved polygons is not so obvious, do the standard solr search and sort by that extra field, I guess solr overhead will be minimal in that case. The real use case is for utility industry, let's say users have areas where they get meter reads, readings are scheduled and assigned to the users that contains such meter GPS location, some users might cover big areas and possible to have smaller areas for other users inside such big areas, so we changed the distance to center for area covered by, seemed simpler and easier. Thanks your response, Guido. On 16/04/13 15:06, Smiley, David W. wrote: Guido, The field type solr.SpatialRecursivePrefixTreeFieldType can only participate in distance reporting for indexed points, not other shapes. In fact, I recommend not attempting to get the distance if the field isn't purely indexed points, as it may get confused if it seems some small shapes. For your use-case, you should index an additional solr.SpatialRecursivePrefixTreeFieldType field just for the points. You could do this external to Solr, or you could write a Solr UpdateRequestProcessor that parses the shape in order to then call getCenter(), and put those points in the other field. ~ David On 4/16/13 7:23 AM, Guido Medina guido.med...@temetra.com wrote: Hi, I got everything in place, my polygons are indexing properly, I played a bit with LSP which helped me a lot, now, I have JTS 1.13 inside solr.war; here is my challenge: I have big polygon (A) which contains smaller polygons (B and C), B and C have some intersection, so if I search for a coordinate inside the 3, I would like to sort by the distance to the centre of the polygons that match the criteria. As example, let's say dot B is on the centre of B, dot C is at the centre of C and dot A is at the intersection of B and C which happens to be the centre of A, so for dot A should be polygon A first and so on. I could compute with the distances using the result but since Solr is doing a heavy load already, why not just include the sort in it. Here is my field type definition: !-- Spatial field type -- fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFac tory units=degrees/ Field definition: !-- JTS spatial polygon field -- field name=geopolygon type=location_rpt indexed=true stored=false required=false multiValued=true/ I'm using the Solr admin UI first to shape my query and then moving to our web app which uses solrj, here is the XML form of my result which includes the query I'm making, which scores all distances to 1.0 (Not what I want): |?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime9/int lst name=params str name=flid,score/str str name=sortscore asc/str str name=indenttrue/str str name=q*:*/str str name=_136620720/str str name=wtxml/str str name=fq{!score=distance}geopolygon:Intersects(-6.271906 53.379284)/str /lst /lst result name=response numFound=3 start=0 maxScore=1.0 doc str name=iduid13972/str float name=score1.0/float/doc doc str name=iduid13979/str float name=score1.0/float/doc doc str name=iduid13974/str float name=score1.0/float/doc /result /response| Thanks for all responses, Guido.
Re: Dynamic data model design questions
Shawn Heisey wrote: Solr does have some *very* limited capability for doing joins between indexes, but generally speaking, you need to flatten the data. thanks! So, using a dynamic schema I'd flatten the following JSON object graph { 'id':'xyz123', 'obj1': { 'child1': { 'prop1': ['val1', 'val2', 'val3'] 'prop2': 123 } 'prop3': 'val4' }, 'obj2': { 'child2': { 'prop3': true } } } to a Solr document something like this? { 'id':'xyz123', 'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'], 'obj1/child1/prop2_i': 123, 'obj1/prop3_s': 'val4', 'obj2/child2/prop3_b': true } I'm using Java, so I'd probably push docs for indexing to Solr and do the searches using SolrJ, right? Solr's ability to change your data after receiving it is fairly limited. The schema has some ability in this regard for indexed values, but the stored data is 100% verbatim as Solr receives it. If you will be using the dataimport handler, it does have some transform capability before sending to Solr. Most of the time, the rule of thumb is that changing the data on the Solr side will require contrib/custom plugins, so it may be easier to do it before Solr receives it. The data import handler is a Solr server side feature and not a client side? Does Solr or SolrJ have any support for doing transformations on the client side? Doing the above transformation should be fairly straight forward, so it could be also done by code on the client side. marko
JavaScript transform switch statement during Data Import
Hello - I'm trying to add a switch statement into a JavaScript function that we use during an import; it's to replace an if else block that is becoming increasingly large. Bizarrely, the switch block is ignore entirely, and it doesn't have any effect whatsoever. Our version info: Solr Specification Version: 3.4.0.2011.09.09.09.06.17 Solr Implementation Version: 3.4.0 1167142 - mike - 2011-09-09 09:06:17 Lucene Specification Version: 3.4.0 Lucene Implementation Version: 3.4.0 1167142 - mike - 2011-09-09 09:02:09 I've tried searching, but can't find anything to suggest this is a known bugs. Has anyone come across this before? Paul -- View this message in context: http://lucene.472066.n3.nabble.com/JavaScript-transform-switch-statement-during-Data-Import-tp4056340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: first time with new keyword, solr take to much time to give the result
Hi, Have you considered ManifoldCF? Otis -- SOLR Performance Monitoring - http://sematext.com/spm/index.html On Tue, Apr 16, 2013 at 10:02 AM, Montu v Boda montu.b...@highqsolutions.com wrote: Hi problem is that the permission is frequently update in our system so that we have to update the index in the same manner other wise it will give wrong result. in that case i think the cache will get effect and the performance may be reduced. Thanks Regards Montu v Boda -- View this message in context: http://lucene.472066.n3.nabble.com/first-time-with-new-keyword-solr-take-to-much-time-to-give-the-result-tp4056254p4056322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 sorting by distance to polygon centre.
David, I just peak it at github, the method will estimate well for our purpose, but depends on JTS which we included in our Solr server only, but we don't want LGPL libraries (v3) in our main project, kind of a show stopper, I understand is needed for spatial4j, Lucene and Solr in general, so we have no issues keeping it at the Solr server. But can't put it on main web project for licensing issues. I know JTS is a great set of needed functions for spatial projects. Shame I can't use it directly, like I had to develop some convex hull by myself. Guido. On 16/04/13 16:14, Smiley, David W. wrote: On 4/16/13 10:57 AM, Guido Medina guido.med...@temetra.com wrote: David, I have been following your stackoverflow posts, I understand what you say, we decided to change the criteria and index an extra field (close to your suggestion), so the sorting will happen now by polygon area desc (Which induced another problem, calculation of polygon area on a sphere), finally I got to the point of testing, also due to what you are saying, is not a good idea to overload more than just the bare use of points (Intersects) inside polygon to get the the list that matches specific criteria. Glad you've been following what I've been up to and hopefully haven't gotten too confused :-). I welcome all feedback. BTW I'll be doing a 75 minute spatial deep dive session at the Lucene/Solr Revolution conference in San Diego May 1st 2nd. Eventually the slides will be posted and hopefully the audio track. To resume, calculate the area of the polygon, again, for curved polygons is not so obvious, do the standard solr search and sort by that extra field, I guess solr overhead will be minimal in that case. FYI Spatial4j will do a decent job estimating it by calculating the geospatial area of the bounding box of a polygon and using the filled % ratio of the polygons 2D area to its Bbox. This logic is in Spatial4j's JtsGeometry.getArea(). So are you storing the area and sorting by it then? (overhead is extremely minimal, this would just be an integer sort) The real use case is for utility industry, let's say users have areas where they get meter reads, readings are scheduled and assigned to the users that contains such meter GPS location, some users might cover big areas and possible to have smaller areas for other users inside such big areas, so we changed the distance to center for area covered by, seemed simpler and easier. You might want to consider doing both -- sort by a function query that combines both factors in some clever way. ~ David
Re: using maven to deploy solr on tomcat
On 4/16/2013 8:47 AM, Adeel Qureshi wrote: the problem is i need to deploy it on servers where i dont know what the absolute path will be .. basically my goal is to load solr with a different set of configuration files based on the environment its in. Is there a a better different way to do this If you have zero control over the target machine, then you might have to live with your solr home being dictated by the location of the servlet container - tomcat in this case. If you change the tomcat startup script to use a different CWD and set TOMCAT_HOME so tomcat works, that might be the solution - but I don't know what effect that might have on spring or other applications. I see two real options other than changing the startup script: 1) Go with an absolute path like C:\main\resources\solr-dev or perhaps /main/resources/solr-dev if you also don't know the OS platform. Tell the server owners that you will require a specific directory location for the Solr data. 2) Utilize .. which gives you something like ../../main/resources/solr-dev for your solr home. Thanks, Shawn
Re: updateLog in Solr 4.2
: : If i disable update log in solr 4.2 then i get the following exception : SEVERE: :java.lang.NullPointerException : at : org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190) Hmmm.. if you don't have updateLog and you run in SolrCloud mode, solr should have given you a clean, clear error that updateLog is required in cloud mode. can you please open a Bug in Jira and attach your config files so we can try to figure out why this isn't happening? -Hoss
Re: Dynamic data model design questions
'obj1/child1/prop1_ss' Try to stick to names that follow Java naming conventions: letter or underscore followed by letters, digits, and underscores. There are place in Solr which have limited rules for names because they support additional syntax. In this case, replace your slashes with underscores. In general, Solr is much more friendly towards static data models. Yes, you can use dynamic fields, but use them in moderation. The more heavily you lean on them, the more likely that you will eventually become unhappy with Solr. How many fields are we talking about here? The trick with Solr is not to brute-force flatten your data model (as you appear to be doing), but to REDESIGN your data model so that it is more amenable to a flat data model, and takes advantage of Solr's features. You can use multiple collections for different types of data. And you can simulate joins across tables by doing a sequence of queries (although it would be nice to have a SolrJ client-side method to do that in one API call.) -- Jack Krupansky -Original Message- From: Marko Asplund Sent: Tuesday, April 16, 2013 11:17 AM To: solr-user Subject: Re: Dynamic data model design questions Shawn Heisey wrote: Solr does have some *very* limited capability for doing joins between indexes, but generally speaking, you need to flatten the data. thanks! So, using a dynamic schema I'd flatten the following JSON object graph { 'id':'xyz123', 'obj1': { 'child1': { 'prop1': ['val1', 'val2', 'val3'] 'prop2': 123 } 'prop3': 'val4' }, 'obj2': { 'child2': { 'prop3': true } } } to a Solr document something like this? { 'id':'xyz123', 'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'], 'obj1/child1/prop2_i': 123, 'obj1/prop3_s': 'val4', 'obj2/child2/prop3_b': true } I'm using Java, so I'd probably push docs for indexing to Solr and do the searches using SolrJ, right? Solr's ability to change your data after receiving it is fairly limited. The schema has some ability in this regard for indexed values, but the stored data is 100% verbatim as Solr receives it. If you will be using the dataimport handler, it does have some transform capability before sending to Solr. Most of the time, the rule of thumb is that changing the data on the Solr side will require contrib/custom plugins, so it may be easier to do it before Solr receives it. The data import handler is a Solr server side feature and not a client side? Does Solr or SolrJ have any support for doing transformations on the client side? Doing the above transformation should be fairly straight forward, so it could be also done by code on the client side. marko
Re: Dynamic data model design questions
On 4/16/2013 9:17 AM, Marko Asplund wrote: Shawn Heisey wrote: So, using a dynamic schema I'd flatten the following JSON object graph { 'id':'xyz123', 'obj1': { 'child1': { 'prop1': ['val1', 'val2', 'val3'] 'prop2': 123 } 'prop3': 'val4' }, 'obj2': { 'child2': { 'prop3': true } } } to a Solr document something like this? { 'id':'xyz123', 'obj1/child1/prop1_ss': ['val1', 'val2', 'val3'], 'obj1/child1/prop2_i': 123, 'obj1/prop3_s': 'val4', 'obj2/child2/prop3_b': true } How you flatten the data is up to you. You have to examine the data and how you want to use it in order to keep the number of fields to a manageable level but retain the flexibility you need. Side note: I would not use anything in a field name other than ASCII alphanumeric and underscore characters. Using special characters (like a slash) has been known to cause problems with some Solr features. Because Solr uses HTTP, there are also potential URL escaping issues. Within a single index, Solr uses a flat model, like a single database table with no relational capability. With two indexes, there is the limited join feature, but I am not familiar with how it works. I'm using Java, so I'd probably push docs for indexing to Solr and do the searches using SolrJ, right? That would be the most sensible approach. The SolrJ API is much more advanced than the APIs for other languages. This is because it is actually part of the Solr codebase and used by Solr internally. The data import handler is a Solr server side feature and not a client side? Does Solr or SolrJ have any support for doing transformations on the client side? Doing the above transformation should be fairly straight forward, so it could be also done by code on the client side. With SolrJ, you can do anything, because you write the code. You can do whatever you like to the data, then send it to Solr. The dataimport handler is indeed a server side feature. It is a contrib module included in the Solr distribution, you have to add a jar to Solr to activate it. Thanks, Shawn
Re: Solr restart is taking more than 1 hour
Thanks for detailed explanation. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165p4056355.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.x replication events on slaves
: In Solr 3.x, I was relying on a postCommit call to a listener in the update : handler to perform data update to caches, this data was used to perform : 'realtime' filtering on the documents. I can't find it at the moment, but IIRC this was a side effect of how snapshots are now loaded on slaves -- there is no longer an explicit commit to read in the new index. For your usecase however, i think what would make more sense (and probably would have always made more sense) is to implement this using the newSearcher hook, which allows you to block usage of the newSearcher until you have finished your hook logic. Alternatively, you can implement CacheRegenerator which was specifically designed for warming caches on newSearcher events, and gives you access to the current cache keys so you can see what items where in the old caches to warm. -Hoss
Re: CloudSolrServer vs ConcurrentUpdateSolrServer for indexing
It sure increased the performance . Thanks for the input. ./zahoor On 14-Apr-2013, at 10:13 PM, J Mohamed Zahoor zah...@indix.com wrote: Thanks.. Will try multithreading with CloudSolrServer. ./zahoor On 13-Apr-2013, at 9:11 PM, Mark Miller markrmil...@gmail.com wrote: On Apr 13, 2013, at 11:07 AM, J Mohamed Zahoor zah...@indix.com wrote: Hi This question has come up many times in the list with lots of variations (which confuses me a lot). Iam using Solr 4.1. one collection , 6 shards, 6 machines. I am using CloudSolrServer inside each mapper to index my documents…. While it is working fine , iam trying to improve the indexing performance. Question is: 1) is CloudSolrServer multiThreaded? No. The proper fast way to use it is to start many threads that all add docs to the same CloudSolrServer instance. In other words, currently, you must do the multi threading yourself. CloudSolrServer is thread safe. 2) Will using ConcurrentUpdateSolr server increase indexing performance? Yes, but at the cost of having to specify a server to talk to - if it goes down, so does your indexing. It's also not very great at reporting errors. Finally, using multiple threads and CloudSolrServer, you can approach the performance of ConcurrentUpdateSolr server. - Mark ./Zahoor
Re: Troubles with solr replication
: Also when I checked the solr log. : : [org.apache.solr.handler.SnapPuller] Master at: : http://192.168.2.204:8080/solr/replication is not available. Index fetch : failed. Exception: Connection refused : : : BTW, I was able to fetch the replication file with wget directly. Are you certian that the network setup for your master slave machines alows them to talk to eachother? you said you could fetch the files from the master via wget, but i'm guessing you were running this from your local machine -- are you certain that when logged in to 192.168.2.174 you can reach port 8080 of 192.168.2.204? -Hoss
zkState changes too often
Hi I am using SolrCloud (4.1) with 6 nodes. When i index the documents from the mapper and as the load increases.. i see these messages in my mapper logs… WHich looks like it is slowing down my indexing speed. 2013-04-16 06:04:18,013 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:04:19,485 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:04:19,487 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:30,006 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:08:30,019 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:08:35,443 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:35,459 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:08:48,929 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:48,931 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:09:12,005 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:09:12,010 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:09:12,011 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:09:12,014 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:09:15,438 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent stat I tried increasing the Zk timeout from 15 to 20 sec… but i still see this message… anything i might try to avoid this? ./Zahoor
Document Missing from Share in Solr cloud
Hi, We noticed a strange behavior in our solr cloud setup, we are using solr4.2 with 1:3 replication setting. We noticed that some of the documents were showing up in search sometimes and not at other, the reason being the document was not present in all the shards. We have restarted zookeeper and also entire cloud, but these documents are not being replicated in all the shards for some reason and hence inconsistent search results. Regards, Ayush
Re: Document Missing from Share in Solr cloud
If you are using the default doc router for indexing in SolrCloud, then a document only exists in a single shard but can be replicated in that shard to any number of replicas. Can you clarify your question as it sounds like you're saying that the document is not replicated across all the replicas for a specific shard? If so, that's definitely a problem ... On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote: Hi, We noticed a strange behavior in our solr cloud setup, we are using solr4.2 with 1:3 replication setting. We noticed that some of the documents were showing up in search sometimes and not at other, the reason being the document was not present in all the shards. We have restarted zookeeper and also entire cloud, but these documents are not being replicated in all the shards for some reason and hence inconsistent search results. Regards, Ayush
Re: zkState changes too often
Are you using a the concurrent low pause garbage collector or perhaps G1? Are you able to use something like visualvm to pinpoint what the bottleneck might be? Otherwise, keep raising the timeout. This means Solr and Zk are not able to talk for that much time - either something needs to be tuned or the time allowed raised. - Mark On Apr 16, 2013, at 12:49 PM, J Mohamed Zahoor zah...@indix.com wrote: Hi I am using SolrCloud (4.1) with 6 nodes. When i index the documents from the mapper and as the load increases.. i see these messages in my mapper logs… WHich looks like it is slowing down my indexing speed. 2013-04-16 06:04:18,013 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:04:18,186 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:04:19,485 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:04:19,487 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:30,006 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:08:30,010 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:08:30,019 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:08:35,443 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:08:35,446 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:35,459 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:08:48,929 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:08:48,931 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 2013-04-16 06:09:12,005 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 6) 2013-04-16 06:09:12,010 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:09:12,011 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:09:12,014 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (5) 2013-04-16 06:09:15,438 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 5) 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: Updating live nodes... (6) 2013-04-16 06:09:15,441 INFO org.apache.solr.common.cloud.ZkStateReader: A cluster state change: WatchedEvent stat I tried increasing the Zk timeout from 15 to 20 sec… but i still see this message… anything i might try to avoid this? ./Zahoor
RE: Document Missing from Share in Solr cloud
That's what I am trying to say, the document is not replicated across all the replicas for a specific shard, hence the query show different results on every refresh. Date: Tue, 16 Apr 2013 11:34:18 -0600 Subject: Re: Document Missing from Share in Solr cloud From: thelabd...@gmail.com To: solr-user@lucene.apache.org If you are using the default doc router for indexing in SolrCloud, then a document only exists in a single shard but can be replicated in that shard to any number of replicas. Can you clarify your question as it sounds like you're saying that the document is not replicated across all the replicas for a specific shard? If so, that's definitely a problem ... On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote: Hi, We noticed a strange behavior in our solr cloud setup, we are using solr4.2 with 1:3 replication setting. We noticed that some of the documents were showing up in search sometimes and not at other, the reason being the document was not present in all the shards. We have restarted zookeeper and also entire cloud, but these documents are not being replicated in all the shards for some reason and hence inconsistent search results. Regards, Ayush
Re: Solr 4.2.1 sorting by distance to polygon centre.
Guido, I encourage you to try to open-source the shape-related code you have to Spatial4j. I realize that for some organizations, that can be really difficult. ~ David On 4/16/13 11:55 AM, Guido Medina guido.med...@temetra.com wrote: David, I just peak it at github, the method will estimate well for our purpose, but depends on JTS which we included in our Solr server only, but we don't want LGPL libraries (v3) in our main project, kind of a show stopper, I understand is needed for spatial4j, Lucene and Solr in general, so we have no issues keeping it at the Solr server. But can't put it on main web project for licensing issues. I know JTS is a great set of needed functions for spatial projects. Shame I can't use it directly, like I had to develop some convex hull by myself. Guido.
Re: Document Missing from Share in Solr cloud
Ok, that makes more sense and is definitely cause for concern. Do you have a sense for whether this is ongoing or happened a few times unexpectedly in the past? If ongoing, then will probably be easier to track down the root cause. On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi cooltec...@outlook.com wrote: That's what I am trying to say, the document is not replicated across all the replicas for a specific shard, hence the query show different results on every refresh. Date: Tue, 16 Apr 2013 11:34:18 -0600 Subject: Re: Document Missing from Share in Solr cloud From: thelabd...@gmail.com To: solr-user@lucene.apache.org If you are using the default doc router for indexing in SolrCloud, then a document only exists in a single shard but can be replicated in that shard to any number of replicas. Can you clarify your question as it sounds like you're saying that the document is not replicated across all the replicas for a specific shard? If so, that's definitely a problem ... On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote: Hi, We noticed a strange behavior in our solr cloud setup, we are using solr4.2 with 1:3 replication setting. We noticed that some of the documents were showing up in search sometimes and not at other, the reason being the document was not present in all the shards. We have restarted zookeeper and also entire cloud, but these documents are not being replicated in all the shards for some reason and hence inconsistent search results. Regards, Ayush
Re: updateLog in Solr 4.2
Can you file a JIRA issue? - minimum you should get a better error. - Mark On Apr 12, 2013, at 9:17 AM, vicky desai vicky.de...@germinait.com wrote: If i disable update log in solr 4.2 then i get the following exception SEVERE: :java.lang.NullPointerException at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266) at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935) at org.apache.solr.cloud.ZkController.register(ZkController.java:761) at org.apache.solr.cloud.ZkController.register(ZkController.java:727) at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Apr 12, 2013 6:39:56 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931) at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.NullPointerException at org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190) at org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156) at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100) at org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:266) at org.apache.solr.cloud.ZkController.joinElection(ZkController.java:935) at org.apache.solr.cloud.ZkController.register(ZkController.java:761) at org.apache.solr.cloud.ZkController.register(ZkController.java:727) at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) ... 12 more and solr fails to start . However if i add updatelog in my solrconfig.xml it starts. Is the update log parameter mandatory for solr4.2 -- View this message in context: http://lucene.472066.n3.nabble.com/updateLog-in-Solr-4-2-tp4055548.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Leader Response Mechanism
Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
solr 3.5 core rename issue
We just tried to use .../solr/admin/cores?action=RENAMEcore=core0other=core5 to rename a core 'old' to 'new'. After the request is done, the solr.xml has new core name, and the solr admin shows the new core name in the list. But the index dir still has the old name as the directory name. I looked into solr 3.5 code, this is what the code does. However, if I bounce tomcat/solr, when solr is started up, it creates new index dir with 'new', and now of course there is no longer any document returned if you search the core. is this a bug? or did I miss anything? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document Missing from Share in Solr cloud
btw ... what is the field type of your unique ID field? On Tue, Apr 16, 2013 at 12:34 PM, Timothy Potter thelabd...@gmail.comwrote: Ok, that makes more sense and is definitely cause for concern. Do you have a sense for whether this is ongoing or happened a few times unexpectedly in the past? If ongoing, then will probably be easier to track down the root cause. On Tue, Apr 16, 2013 at 12:08 PM, Cool Techi cooltec...@outlook.comwrote: That's what I am trying to say, the document is not replicated across all the replicas for a specific shard, hence the query show different results on every refresh. Date: Tue, 16 Apr 2013 11:34:18 -0600 Subject: Re: Document Missing from Share in Solr cloud From: thelabd...@gmail.com To: solr-user@lucene.apache.org If you are using the default doc router for indexing in SolrCloud, then a document only exists in a single shard but can be replicated in that shard to any number of replicas. Can you clarify your question as it sounds like you're saying that the document is not replicated across all the replicas for a specific shard? If so, that's definitely a problem ... On Tue, Apr 16, 2013 at 11:22 AM, Cool Techi cooltec...@outlook.com wrote: Hi, We noticed a strange behavior in our solr cloud setup, we are using solr4.2 with 1:3 replication setting. We noticed that some of the documents were showing up in search sometimes and not at other, the reason being the document was not present in all the shards. We have restarted zookeeper and also entire cloud, but these documents are not being replicated in all the shards for some reason and hence inconsistent search results. Regards, Ayush
Re: solr 3.5 core rename issue
On 4/16/2013 2:02 PM, Jie Sun wrote: We just tried to use .../solr/admin/cores?action=RENAMEcore=core0other=core5 to rename a core 'old' to 'new'. After the request is done, the solr.xml has new core name, and the solr admin shows the new core name in the list. But the index dir still has the old name as the directory name. I looked into solr 3.5 code, this is what the code does. However, if I bounce tomcat/solr, when solr is started up, it creates new index dir with 'new', and now of course there is no longer any document returned if you search the core. is this a bug? or did I miss anything? If your solr.xml is missing the 'persistent' attribute on the solr tag, or it is set to false, then I can imagine it behaving this way. This must be set to true, or changes that you make with the core admin API will not be written to disk, so they will not survive a restart. solr sharedLib=lib persistent=true cores adminPath=/admin/cores I haven't used the RENAME functionality, but I use the core SWAP feature extensively. I have cores with names like s0live and s0build, but they actually refer to directories with names like s0_0 and s0_1. When they swap, the directory location of the index doesn't change, but it's like I have renamed both of them with each other's name. Thanks, Shawn
Re: solr 3.5 core rename issue
Hi Shawn, I do have persistent=true in my solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=default instanceDir=.// core name=413a instanceDir=.// core name=blah instanceDir=.// ... /cores /solr the command I ran was to rename from '413' to '413a'. when i debug through solr CoreAdminHandler, I notice the persistent flag only controls if the new data will be persisted to solr.xml or not, thus as you can see, it did changed my solr.xml, there is no problem here. But the index dir ends up with no change at all (still '413'). I guess swap will have similar issue, I bet your 's0_0' directory actually hold data for core s0build, and s0_1 holds data for s0live after you swap them. Because I dont see anywhere in CoreAdminHandler and CoreContainer code actually rename the index directory. I might be wrong, but you can test and find out. Jie -- View this message in context: http://lucene.472066.n3.nabble.com/solr-3-5-core-rename-issue-tp4056425p4056435.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Leader Response Mechanism
Hi Mark; When I speak with proper terms I want to ask that: is there a data locality of spatial locality ( http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html - I mean if you have data on your machine, use it and don't search it anywhere else, just search for remaining parts) at querying on a leader of SolrCloud? 2013/4/16 Mark Miller markrmil...@gmail.com Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Why indexing and querying performance is better at SolrCloud compared to older versions of Solr?
Is there any document that describes why indexing and querying performance is better at SolrCloud compared to older versions of Solr? I was examining that architecture to use: there will be a cloud of Solr that just do indexing and there will be another cloud that copies that indexes into them and just to querying because of to get better performance. However if I use SolrCloud I think that there is no need to build up an architecture such like it.
Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Hi Otis and Jack; I have made a research about highlights and debugged code. I see that highlight are query dependent and not stored. Why Solr uses Lucene for storing text, I mean i.e. content of a web page. Is there any comparison about to store texts at Hbase or any other databases versus Lucene. Also I want to learn that is there anybody who has used anything else from Lucene to store text of document at our solr user list? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Source code is your best bet. Wiki has info about how to use it, but not how highlighting is implemented. But you don't need to understand the implementation details to understand that they are dynamic, computed specifically for each query for each matching document, so you cannot store them anywhere ahead of time. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; It seems that I should read more about highlights. Is there any where that explains in detail how highlights are generated at Solr? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You can't store highlights ahead of time because they are query dependent. You could store documents in HBase and use Solr just for indexing. Is that what you want to do? If so, a custom SearchComponent executed after QueryComponent could fetch data from external store like HBase. I'm not sure if I'd recommend that. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com wrote: Actually I don't think to store documents at Solr. I want to store just highlights (snippets) at Hbase and I want to retrieve them from Hbase when needed. What do you think about separating just highlights from Solr and storing them into Hbase at Solrclod. By the way if you explain at which process and how highlights are genareted at Solr you are welcome. 2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com You may also be interested in looking at things like solrbase (on Github). Otis -- Solr ElasticSearch Support http://sematext.com/ On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?
Re: Empty Solr 4.2.1 can not create Collection
: sorry for pushing, but I just replayed the steps with solr 4.0 where : everything works fine. : Then I switched to solr 4.2.1 and replayed the exact same steps and the : collection won't start and no leader will be elected. : : Any clues ? : Should I try it on the developer mailing list, maybe it's a bug ? I'm not really understanding what the sequence of events is that's leading you to this error, but if you can reproduce a problem in which there is no leader election (and you get the NPE listed below) when creating a collection then yes, absolutely, please open a Jira and include... 1) the specific list of steps to reproduce starting from a 4.2.1 install 2) the configs you start with as well as any configs you are specifying when creating collections 3) snapshots of clusterstate.json taken before and after you encounter the problem 4) logs from each of hte solr servers you run in your test. : : Kind Regards : Alexander : : Am 2013-04-10 22:27, schrieb A.Eibner: : Hi, : : here the clusterstate.json (from zookeeper) after creating the core: : : {storage:{ : shards:{shard1:{ : range:8000-7fff, : state:active, : replicas:{app02:9985_solr_storage-core:{ : shard:shard1, : state:down, : core:storage-core, : collection:storage, : node_name:app02:9985_solr, : base_url:http://app02:9985/solr, : router:compositeId}} : cZxid = 0x10024 : ctime = Wed Apr 10 22:18:13 CEST 2013 : mZxid = 0x1003d : mtime = Wed Apr 10 22:21:26 CEST 2013 : pZxid = 0x10024 : cversion = 0 : dataVersion = 2 : aclVersion = 0 : ephemeralOwner = 0x0 : dataLength = 467 : numChildren = 0 : : But looking in the log files I found the following error (this also : occures with the collection api) : : SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore : 'storage_shard1_replica1': : at : org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:483) : : at : org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:140) : : at : org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) : : at : org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:591) : : at : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:192) : : at : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) : : at : org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) : : at : org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) : : at : org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225) : : at : org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) : : at : org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) : : at : org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) : : at : org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) : : at : org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) : at : org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:999) : : at : org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:565) : : at : org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:307) : : at : java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) : : at : java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) : : at java.lang.Thread.run(Thread.java:722) : Caused by: org.apache.solr.common.cloud.ZooKeeperException: : at : org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:931) : at : org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) : at : org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) : at : org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:479) : : ... 19 more : Caused by: java.lang.NullPointerException : at : org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:190) : : at : org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:156) : : at : org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:100) : : at :
When a search query comes to a replica what happens?
I want to make it clear in my mind: When a search query comes to a replica what happens? -Does it forwards the search query to leader and leader collects all the data and prepares response (this will cause a performance issue because leader is responsible for indexing at same time) or - replica communicates with leader and learns where is remaining data(leaders asks to Zookeper and tells it to replica) and replica collects all data and response it?
How SolrCloud Balance Number of Documents at each Shard?
Is it possible that different shards have different number of documents or does SolrCloud balance them? I ask this question because I want to learn the mechanism behind how Solr calculete hash value of the identifier of the document. Is it possible that hash function produces more documents into one of the shards other than any of shards. (because this may cause a bottleneck at some leaders of SolrCloud)
Re: When a search query comes to a replica what happens?
Hi, No, I believe redirect from replica to leader would happen only at index time, so a doc first gets indexed to leader and from there it's replicated to non-leader shards. At query time there is no redirect to leader, I imagine, as that would quickly turn leaders into hotspots. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: I want to make it clear in my mind: When a search query comes to a replica what happens? -Does it forwards the search query to leader and leader collects all the data and prepares response (this will cause a performance issue because leader is responsible for indexing at same time) or - replica communicates with leader and learns where is remaining data(leaders asks to Zookeper and tells it to replica) and replica collects all data and response it?
Re: How SolrCloud Balance Number of Documents at each Shard?
They won't be exact, but should be close. Are you seeing some *big* differences? Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com wrote: Is it possible that different shards have different number of documents or does SolrCloud balance them? I ask this question because I want to learn the mechanism behind how Solr calculete hash value of the identifier of the document. Is it possible that hash function produces more documents into one of the shards other than any of shards. (because this may cause a bottleneck at some leaders of SolrCloud)
Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
People do use other data stores to retrieve data sometimes. e.g. Mongo is popular for that. Like I hinted in another email, I wouldn't necessarily recommend this for common cases. Don't do it unless you really know you need it. Otherwise, just store in Solr. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis and Jack; I have made a research about highlights and debugged code. I see that highlight are query dependent and not stored. Why Solr uses Lucene for storing text, I mean i.e. content of a web page. Is there any comparison about to store texts at Hbase or any other databases versus Lucene. Also I want to learn that is there anybody who has used anything else from Lucene to store text of document at our solr user list? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Source code is your best bet. Wiki has info about how to use it, but not how highlighting is implemented. But you don't need to understand the implementation details to understand that they are dynamic, computed specifically for each query for each matching document, so you cannot store them anywhere ahead of time. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; It seems that I should read more about highlights. Is there any where that explains in detail how highlights are generated at Solr? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You can't store highlights ahead of time because they are query dependent. You could store documents in HBase and use Solr just for indexing. Is that what you want to do? If so, a custom SearchComponent executed after QueryComponent could fetch data from external store like HBase. I'm not sure if I'd recommend that. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com wrote: Actually I don't think to store documents at Solr. I want to store just highlights (snippets) at Hbase and I want to retrieve them from Hbase when needed. What do you think about separating just highlights from Solr and storing them into Hbase at Solrclod. By the way if you explain at which process and how highlights are genareted at Solr you are welcome. 2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com You may also be interested in looking at things like solrbase (on Github). Otis -- Solr ElasticSearch Support http://sematext.com/ On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?
Re: Why indexing and querying performance is better at SolrCloud compared to older versions of Solr?
Correct. With SolrCloud you typically don't need to make this separation (with ElasticSearch one can designate some nodes as non-data nodes). SolrCloud won't necessarily always be faster because it typically involves sharding and thus a distributed search, while some non-SolrCloud setups can hold the whole index locally and thus avoid the network part. General (and friendly!) comment - you may find it faster/cheaper/more efficient to just pick the approach and do it, unless you are really doing this purely to learn the theory. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:27 PM, Furkan KAMACI furkankam...@gmail.com wrote: Is there any document that describes why indexing and querying performance is better at SolrCloud compared to older versions of Solr? I was examining that architecture to use: there will be a cloud of Solr that just do indexing and there will be another cloud that copies that indexes into them and just to querying because of to get better performance. However if I use SolrCloud I think that there is no need to build up an architecture such like it.
Re: SolrCloud Leader Response Mechanism
If query comes to shard X on some node and this shard X is NOT a leader, but HAS data, it will just execute the query. If it needs to query shards on other nodes, it will have the info about which shards to query and will just do that and aggregate the results. It doesn't have to ask leader for permission, for info, etc. It can just do it because it knows where things are. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; When I speak with proper terms I want to ask that: is there a data locality of spatial locality ( http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html - I mean if you have data on your machine, use it and don't search it anywhere else, just search for remaining parts) at querying on a leader of SolrCloud? 2013/4/16 Mark Miller markrmil...@gmail.com Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Re: When a search query comes to a replica what happens?
All in all will replica ask to its leader about where is remaining of data or it directly asks to Zookeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com Hi, No, I believe redirect from replica to leader would happen only at index time, so a doc first gets indexed to leader and from there it's replicated to non-leader shards. At query time there is no redirect to leader, I imagine, as that would quickly turn leaders into hotspots. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: I want to make it clear in my mind: When a search query comes to a replica what happens? -Does it forwards the search query to leader and leader collects all the data and prepares response (this will cause a performance issue because leader is responsible for indexing at same time) or - replica communicates with leader and learns where is remaining data(leaders asks to Zookeper and tells it to replica) and replica collects all data and response it?
Re: When a search query comes to a replica what happens?
No. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: All in all will replica ask to its leader about where is remaining of data or it directly asks to Zookeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com Hi, No, I believe redirect from replica to leader would happen only at index time, so a doc first gets indexed to leader and from there it's replicated to non-leader shards. At query time there is no redirect to leader, I imagine, as that would quickly turn leaders into hotspots. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: I want to make it clear in my mind: When a search query comes to a replica what happens? -Does it forwards the search query to leader and leader collects all the data and prepares response (this will cause a performance issue because leader is responsible for indexing at same time) or - replica communicates with leader and learns where is remaining data(leaders asks to Zookeper and tells it to replica) and replica collects all data and response it?
Re: How SolrCloud Balance Number of Documents at each Shard?
Hi Otis; Firstly thanks for your answers. So do you mean that hashing mechanism will randomly route a document into a randomly shard? I want to ask it because I consider about putting a load balancer in front of my SolrCloud and manually route some documents into some other shards to avoid bottleneck. 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com They won't be exact, but should be close. Are you seeing some *big* differences? Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com wrote: Is it possible that different shards have different number of documents or does SolrCloud balance them? I ask this question because I want to learn the mechanism behind how Solr calculete hash value of the identifier of the document. Is it possible that hash function produces more documents into one of the shards other than any of shards. (because this may cause a bottleneck at some leaders of SolrCloud)
Re: Push/pull model between leader and replica in one shard
Hi, Replication when everything is working well is push: * request comes to any node, ideally leader * doc is indexed on leader * doc is copied to replicas If replica falls too far behind (not exactly sure what the too far threshold is), it uses pull to replicate the whole index from leader. Mark can answer the part about where tlog gets replayed to catch up on docs that were missed while big index replication pull was happening. This is a good thread to read on this topic: http://search-lucene.com/m/y1yj218J2v82 Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 1:36 AM, SuoNayi suonayi2...@163.com wrote: Hi, can someone explain more details about what model is used to sync docs between the lead and replica in the shard? The model can be push or pull.Supposing I have only one shard that has 1 leader and 2 replicas, when the leader receives a update request, does it will scatter the request to each available and active replica at first and then processes the request locally at last?In this case if the replicas are able to catch up with the leader can I think this is a push model that the leader pushes updates to it's replicas? What happens if a replica is behind the leader?Will the replica pull docs from the leader and keep a track of the coming updates from the lead in a log(called tlog)?If so when it complete pulling docs it will replay updates in the tlog at last? regards
Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Thanks again for your answer. If I find any document about such comparisons that I would like to read. By the way, is there any advantage for using Lucene instead of anything else as like that: Using Lucene is naturally supported at Solr and if I use anything else I may face with some compatibility problems or communicating issues? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com People do use other data stores to retrieve data sometimes. e.g. Mongo is popular for that. Like I hinted in another email, I wouldn't necessarily recommend this for common cases. Don't do it unless you really know you need it. Otherwise, just store in Solr. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis and Jack; I have made a research about highlights and debugged code. I see that highlight are query dependent and not stored. Why Solr uses Lucene for storing text, I mean i.e. content of a web page. Is there any comparison about to store texts at Hbase or any other databases versus Lucene. Also I want to learn that is there anybody who has used anything else from Lucene to store text of document at our solr user list? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Source code is your best bet. Wiki has info about how to use it, but not how highlighting is implemented. But you don't need to understand the implementation details to understand that they are dynamic, computed specifically for each query for each matching document, so you cannot store them anywhere ahead of time. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; It seems that I should read more about highlights. Is there any where that explains in detail how highlights are generated at Solr? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You can't store highlights ahead of time because they are query dependent. You could store documents in HBase and use Solr just for indexing. Is that what you want to do? If so, a custom SearchComponent executed after QueryComponent could fetch data from external store like HBase. I'm not sure if I'd recommend that. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com wrote: Actually I don't think to store documents at Solr. I want to store just highlights (snippets) at Hbase and I want to retrieve them from Hbase when needed. What do you think about separating just highlights from Solr and storing them into Hbase at Solrclod. By the way if you explain at which process and how highlights are genareted at Solr you are welcome. 2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com You may also be interested in looking at things like solrbase (on Github). Otis -- Solr ElasticSearch Support http://sematext.com/ On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?
Re: How SolrCloud Balance Number of Documents at each Shard?
Hi, Routing is not random... have a look at https://issues.apache.org/jira/browse/SOLR-2341 . In short, you shouldn't have to route manually from your app. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:26 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; Firstly thanks for your answers. So do you mean that hashing mechanism will randomly route a document into a randomly shard? I want to ask it because I consider about putting a load balancer in front of my SolrCloud and manually route some documents into some other shards to avoid bottleneck. 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com They won't be exact, but should be close. Are you seeing some *big* differences? Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:11 PM, Furkan KAMACI furkankam...@gmail.com wrote: Is it possible that different shards have different number of documents or does SolrCloud balance them? I ask this question because I want to learn the mechanism behind how Solr calculete hash value of the identifier of the document. Is it possible that hash function produces more documents into one of the shards other than any of shards. (because this may cause a bottleneck at some leaders of SolrCloud)
Re: SolrCloud Leader Response Mechanism
Hi Otis; You said: It can just do it because it knows where things are. Does it learn it from Zookeeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com If query comes to shard X on some node and this shard X is NOT a leader, but HAS data, it will just execute the query. If it needs to query shards on other nodes, it will have the info about which shards to query and will just do that and aggregate the results. It doesn't have to ask leader for permission, for info, etc. It can just do it because it knows where things are. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; When I speak with proper terms I want to ask that: is there a data locality of spatial locality ( http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html - I mean if you have data on your machine, use it and don't search it anywhere else, just search for remaining parts) at querying on a leader of SolrCloud? 2013/4/16 Mark Miller markrmil...@gmail.com Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Use Solr. It's pretty clear you don't yet have any problems that would make you think about alternatives. Using Solr to store and not just index will make your life simpler (and your app simpler and likely faster). Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:31 PM, Furkan KAMACI furkankam...@gmail.com wrote: Thanks again for your answer. If I find any document about such comparisons that I would like to read. By the way, is there any advantage for using Lucene instead of anything else as like that: Using Lucene is naturally supported at Solr and if I use anything else I may face with some compatibility problems or communicating issues? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com People do use other data stores to retrieve data sometimes. e.g. Mongo is popular for that. Like I hinted in another email, I wouldn't necessarily recommend this for common cases. Don't do it unless you really know you need it. Otherwise, just store in Solr. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:32 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis and Jack; I have made a research about highlights and debugged code. I see that highlight are query dependent and not stored. Why Solr uses Lucene for storing text, I mean i.e. content of a web page. Is there any comparison about to store texts at Hbase or any other databases versus Lucene. Also I want to learn that is there anybody who has used anything else from Lucene to store text of document at our solr user list? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Source code is your best bet. Wiki has info about how to use it, but not how highlighting is implemented. But you don't need to understand the implementation details to understand that they are dynamic, computed specifically for each query for each matching document, so you cannot store them anywhere ahead of time. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; It seems that I should read more about highlights. Is there any where that explains in detail how highlights are generated at Solr? 2013/4/11 Otis Gospodnetic otis.gospodne...@gmail.com Hi, You can't store highlights ahead of time because they are query dependent. You could store documents in HBase and use Solr just for indexing. Is that what you want to do? If so, a custom SearchComponent executed after QueryComponent could fetch data from external store like HBase. I'm not sure if I'd recommend that. Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Apr 11, 2013 at 10:01 AM, Furkan KAMACI furkankam...@gmail.com wrote: Actually I don't think to store documents at Solr. I want to store just highlights (snippets) at Hbase and I want to retrieve them from Hbase when needed. What do you think about separating just highlights from Solr and storing them into Hbase at Solrclod. By the way if you explain at which process and how highlights are genareted at Solr you are welcome. 2013/4/9 Otis Gospodnetic otis.gospodne...@gmail.com You may also be interested in looking at things like solrbase (on Github). Otis -- Solr ElasticSearch Support http://sematext.com/ On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?
Re: How do I recover the position and offset a highlight for solr (4.1/4.2)?
Hi, It doesn't have the offset information, but checkout my patch https://issues.apache.org/jira/browse/SOLR-4722 which outputs the position of each term that's been matched. I'm eager to get some feedback on this approach and any improvements that might be suggested. Cheers, Tricia On Wed, Mar 27, 2013 at 8:28 AM, Skealler Nametic bchaillou...@gmail.comwrote: Hi, I would like to retrieve the position and offset of each highlighting found. I searched on the internet, but I have not found the exact solution to my problem...
Re: how to display groups along with matching terms in solr auto-suggestion?
Hi, Try Solr Suggester, though I'm not sure if you can group with it. tried http://search-lucene.com/?q=suggester+groupfc_project=Solr but it doesn't seem to yield much. If you need to group suggestions like what you see on http://search-lucene.com/ for example, we use our own AC from http://sematext.com/products/autocomplete/index.html for that. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 7:38 AM, sharmila thapa shar...@gmail.com wrote: Hi, I have used Terms for auto-suggestion. But it just list the terms that matches terms.prefix from index , along with these term suggestions, I have to display the product groups that matches with the input prefix. Is it possible in solr auto-suggest? Somebody could please help me on this issue?
Re: SolrCloud Leader Response Mechanism
Oui, ZK holds the map. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:33 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; You said: It can just do it because it knows where things are. Does it learn it from Zookeeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com If query comes to shard X on some node and this shard X is NOT a leader, but HAS data, it will just execute the query. If it needs to query shards on other nodes, it will have the info about which shards to query and will just do that and aggregate the results. It doesn't have to ask leader for permission, for info, etc. It can just do it because it knows where things are. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; When I speak with proper terms I want to ask that: is there a data locality of spatial locality ( http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html - I mean if you have data on your machine, use it and don't search it anywhere else, just search for remaining parts) at querying on a leader of SolrCloud? 2013/4/16 Mark Miller markrmil...@gmail.com Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Re: Some Questions About Using Solr as Cloud
See https://issues.apache.org/jira/browse/SOLR-4532 https://issues.apache.org/jira/browse/SOLR-1535 https://issues.apache.org/jira/browse/SOLR-4619 Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 7:37 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Erick; Thanks for the explanation. You said: You cannot transfer just the indexed form of a document from one core to another, you have to re-index the doc. why do you think like that? 2013/4/16 Erick Erickson erickerick...@gmail.com Yes. Every node is really self-contained. When you send a doc to a cluster where each shard has a replica, the raw doc is sent to each node of that shard and indexed independently. About old docs, it's the same as Solr 3.6. Data associated with docs stays around in the index until it's merged away. You cannot transfer just the indexed form of a document from one core to another, you have to re-index the doc. Best Erick On Mon, Apr 15, 2013 at 7:46 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Jack; I see that SolrCloud makes everything automated. When I use SolrCloud is it true that: there may be more than one computer responsible for indexing at any time? 2013/4/15 Jack Krupansky j...@basetechnology.com There are no masters or slaves in SolrCloud - it's fully distributed. Some cluster nodes will be leaders (of the shard on that node) at a given point in time, but different nodes may be leaders at different points in time as they become elected. In a distributed cluster you would never want to store documents only on one node. Sure, you can do that by setting the replication factor to 1, but that defeats half the purpose for SolrCloud. Index transfer is automatic - SolrCloud supports fully distributed update. You might be getting confused with the old Master-Slave-Replication model that Solr had (and still has) which is distinct from SolrCloud. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Sunday, April 14, 2013 7:45 PM To: solr-user@lucene.apache.org Subject: Some Questions About Using Solr as Cloud I read wiki and reading SolrGuide of Lucidworks. However I want to clear something in my mind. Here are my questions: 1) Does SolrCloud lets a multi master design (is there any document that I can read about it)? 2) Let's assume that I use multiple cores i.e. core A and core B. Let's assume that there is a document just indexed at core B. If I send a search request to core A can I get result? 3) When I use multi master design (if exists) can I transfer one master's index data into another (with its slaves or not)? 4) When I use multi core design can I transfer one index data into another core or anywhere else? By the way thanks for the quick responses and kindness at mail list.
Re: SolrCloud Leader Response Mechanism
Replica asks to Zookeper and Leader does not do anything. Thanks for your answer Otis. 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com Oui, ZK holds the map. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 6:33 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Otis; You said: It can just do it because it knows where things are. Does it learn it from Zookeeper? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com If query comes to shard X on some node and this shard X is NOT a leader, but HAS data, it will just execute the query. If it needs to query shards on other nodes, it will have the info about which shards to query and will just do that and aggregate the results. It doesn't have to ask leader for permission, for info, etc. It can just do it because it knows where things are. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:23 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Mark; When I speak with proper terms I want to ask that: is there a data locality of spatial locality ( http://www.roguewave.com/portals/0/products/threadspotter/docs/2011.2/manual_html_linux/manual_html/ch_intro_locality.html - I mean if you have data on your machine, use it and don't search it anywhere else, just search for remaining parts) at querying on a leader of SolrCloud? 2013/4/16 Mark Miller markrmil...@gmail.com Leaders don't have much to do with querying - the node that you query will determine what other nodes it has to query to search the whole index and do a scatter/gather for you. (Though in some cases that request can be proxied to another node) - Mark On Apr 16, 2013, at 7:48 AM, Furkan KAMACI furkankam...@gmail.com wrote: When a leader responses for a query, does it says that: If I have the data what I am looking for, I should build response with it, otherwise I should find it anywhere. Because it may be long to search it? or does it says I only index the data, I will tell it to other guys to build up the response query?
Re: Storing Solr Index on NFS
Yesterday, we spent 1 hour with a client looking at their cluster's performance metrics SPM, their indexing logs, etc. trying to figure out why some indexing was slower than it should have been. We traced issues to network hickups, to VMs that would move from host to host, etc. Really fancy and powerful system in terms of hardware resources, but in the end a bit too far from just locally attached HDD or SDD that would not have issues like the ones we found. I'd stay away from NFS for the same reason - it's another moving part on the other side of the network. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 7:15 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Walter; You said: It is not safe to share Solr index files between two Solr servers. Why do you think like that? 2013/4/16 Tim Vaillancourt t...@elementspace.com If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood.** org wun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Is cache useful for my scenario?
Hi Sam, Sounds like you may want to disable caches, yes. But instead of guessing, just look at the stats and based on that configure your caches. You can get stats from Solr Admin page or, if you need long-term stats and performance patterns, use SPM for Solr or something similar. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 5:25 AM, samabhiK qed...@gmail.com wrote: Hi, I am new in Solr and wish to use version 4.2.x for my app in production. I want to show hundreds and thousands of markers on a map with contents coming from Solr. As the user moves around the map and pans, the browser will fetch data/markers using a BBOX filter (based on the maps' viewport boundary). There will be a lot of data that will be indexed in Solr. My question is, does caching help in my case? As the filter queries will vary for almost all users ( because the viewport latitude/longitude would vary), in what ways can I use Caching to increase performance. Should I completely turn off caching? If you can suggest by your experience, it would be really nice. Thanks Sam -- View this message in context: http://lucene.472066.n3.nabble.com/Is-cache-useful-for-my-scenario-tp4056250.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 3.5 core rename issue
On 4/16/2013 2:39 PM, Jie Sun wrote: Hi Shawn, I do have persistent=true in my solr.xml: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores core name=default instanceDir=.// core name=413a instanceDir=.// core name=blah instanceDir=.// ... /cores /solr the command I ran was to rename from '413' to '413a'. I think I see the problem. You have three cores that all point to the same instanceDir, and no dataDir parameter. Normally the dataDir parameter defaults to data in the instanceDir, but perhaps if you have multiple cores sharing the instanceDir, it will use the core name instead. With this solr.xml, I can see why you're having a problem. The solr.xml file doesn't tell Solr where the dataDir is. If you set up an explicit dataDir option for each core, then it should work out the way you expect it to. Here's an excerpt from my solr.xml: core name=s0live instanceDir=/index/solr/cores/s0_1/ dataDir=/index/solr/data/s0_1/ core name=s0build instanceDir=cores/s0_0/ dataDir=../../data/s0_0/ You are correct about what happens with my directories on a swap, but because solr.xml gets updated and has an explicit dataDir for each core, everything works. Thanks, Shawn
Re: Storing Solr Index on NFS
I don't want to bother but I try to understand that part: When yo perform a commit in solr you have (for an instant) two versions of the index. The commit produces new segments (with new documents, new deletions, etc). After creating these new segments a new index searcher is created and its caches begin to autowarm. At this point the old index searcher that you were using is still active receiving requests. After the new index searcher finishes loading and autowarming the old searcher is discarded. So does it mean that when I have multiple Solr servers and a shared index, I should synchronize the caches at that different machines RAMs? 2013/4/17 Otis Gospodnetic otis.gospodne...@gmail.com Yesterday, we spent 1 hour with a client looking at their cluster's performance metrics SPM, their indexing logs, etc. trying to figure out why some indexing was slower than it should have been. We traced issues to network hickups, to VMs that would move from host to host, etc. Really fancy and powerful system in terms of hardware resources, but in the end a bit too far from just locally attached HDD or SDD that would not have issues like the ones we found. I'd stay away from NFS for the same reason - it's another moving part on the other side of the network. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 16, 2013 at 7:15 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hi Walter; You said: It is not safe to share Solr index files between two Solr servers. Why do you think like that? 2013/4/16 Tim Vaillancourt t...@elementspace.com If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwunder@wunderwood. ** org wun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Push/pull model between leader and replica in one shard
On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote: Hi, can someone explain more details about what model is used to sync docs between the lead and replica in the shard? The model can be push or pull.Supposing I have only one shard that has 1 leader and 2 replicas, when the leader receives a update request, does it will scatter the request to each available and active replica at first and then processes the request locally at last?In this case if the replicas are able to catch up with the leader can I think this is a push model that the leader pushes updates to it's replicas? Currently, the leader adds the doc locally and then sends it to all replicas concurrently. What happens if a replica is behind the leader?Will the replica pull docs from the leader and keep a track of the coming updates from the lead in a log(called tlog)?If so when it complete pulling docs it will replay updates in the tlog at last? If an update forwarded from a leader to a replica fails it's likely because that replica died. Just in case, the leader will ask that replica to enter recovery. When a node comes up and is not a leader, it also enters recovery. Recovery tries to peersync from the leader, and if that fails (works if off by about 100 updates), it replicates the entire index. If you are interested in more details on the SolrCloud architecture, I've given a few talks on it - two of them here: http://vimeo.com/43913870 http://www.youtube.com/watch?v=eVK0wLkLw9w - Mark
Re: Push/pull model between leader and replica in one shard
Really nice presentation. 2013/4/17 Mark Miller markrmil...@gmail.com On Apr 16, 2013, at 1:36 AM, SuoNayi suonayi2...@163.com wrote: Hi, can someone explain more details about what model is used to sync docs between the lead and replica in the shard? The model can be push or pull.Supposing I have only one shard that has 1 leader and 2 replicas, when the leader receives a update request, does it will scatter the request to each available and active replica at first and then processes the request locally at last?In this case if the replicas are able to catch up with the leader can I think this is a push model that the leader pushes updates to it's replicas? Currently, the leader adds the doc locally and then sends it to all replicas concurrently. What happens if a replica is behind the leader?Will the replica pull docs from the leader and keep a track of the coming updates from the lead in a log(called tlog)?If so when it complete pulling docs it will replay updates in the tlog at last? If an update forwarded from a leader to a replica fails it's likely because that replica died. Just in case, the leader will ask that replica to enter recovery. When a node comes up and is not a leader, it also enters recovery. Recovery tries to peersync from the leader, and if that fails (works if off by about 100 updates), it replicates the entire index. If you are interested in more details on the SolrCloud architecture, I've given a few talks on it - two of them here: http://vimeo.com/43913870 http://www.youtube.com/watch?v=eVK0wLkLw9w - Mark
Re: Is cache useful for my scenario?
: There will be a lot of data that will be indexed in Solr. My question is, : does caching help in my case? As the filter queries will vary for almost all : users ( because the viewport latitude/longitude would vary), in what ways : can I use Caching to increase performance. Should I completely turn off : caching? you can use the cache localparam on your fq params to disable caching of those specific bbox filter queries w/o needing to completley disable caching. that way if you have any other filter queries that could leverage caching, or use faceting, etc...) thy can still take advantage of the caches... http://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters -Hoss