Re: OutOfMemoryError
I upgraded java to version 7 and everything seems to be stable now! BR, Arkadi On 03/25/2013 09:54 PM, Shawn Heisey wrote: On 3/25/2013 1:34 AM, Arkadi Colson wrote: I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as parameters. I also added -XX:+UseG1GC to the java process. But now the whole machine crashes! Any idea why? Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0 Linux (the out of memory killer, or oom-killer) is deciding to kill the java process because the entire machine is out of memory. Normally it kills off the process using the most memory. This will only happen when all RAM is fully allocated to programs as well as all available swap space. At this point, this is not a direct problem with Solr. It *could* be a problem with Java itself, but that is not very likely. Because Java is set to use only 8GB out of the 12GB you have on the machine, this suggests that you have at least one other memory-intensive application on the same server. Are you using the same hardware to run a website and/or database? Solr works best on dedicated hardware. Thanks, Shawn
Disc space and replication
Hi When replication is down for some time or an instance crashed for some reason replication will always start over again from the beginning. This means it will copy the whole shard over of about 150GB. So we need at least a disc of about 300 GB. I've read somewhere that Solr will replicate everything when 100 entries are missing? Why is that? Is it configurable? What about optimization? Is it still needed in SolrCloud? Will it reduce the disc usage? Does it also need twice the shard size to run successful? Is it correct that currently the only option for now the make more shards to reduce the disc space? Is the any progress in the resharding option the developers are working on? Thx! -- Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81
Elasticsearch with kerberos
Hi, Is there any integration of Solr with Kerberos? Thanks and regards, Debika Mukherjee CLOUD BBSR VOIP 6743071561 CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: [ScriptUpdateProcessor] Params aren't being picked up from solrconfig
I cannot believe I've looked over this :} Thanks for helping me out. It works fine now. I'd like to contribute to the wiki pagehttp://wiki.apache.org/solr/ScriptUpdateProcessorand add a python example. So, if anyone could allow me write access or tell me how to do this without, I'd be happy to contribute. On Wed, Mar 27, 2013 at 12:38 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : none of the params I specify in solrconfig.xml are being picked up. The : error I'm getting is: NameError: global name 'params' is not defined. ... : updateRequestProcessorChain name=script : processor class=solr.StatelessScriptUpdateProcessorFactory : str name=scriptsummarize.py/str : /processor : !-- optional parameters passed to script -- : lst name=params : str name=from_fieldabstract/str : str name=to_fieldsummary/str : /lst ...that list of params isn't inside the processor tag, so StatelessScriptUpdateProcessorFactory doesn't know anything about it, so it's not passing it to the ScriptEngineManager -Hoss
Re: [ScriptUpdateProcessor] Params aren't being picked up from solrconfig
Hi Rene, Thanks for offering to help with wiki documentation. You'll need to register on the wiki first, then tell us your wiki username, and we'll add you to ContributorsGroup, which will allow you to make edits. Steve On Mar 27, 2013, at 7:40 AM, Rene Nederhand r...@nederhand.net wrote: I cannot believe I've looked over this :} Thanks for helping me out. It works fine now. I'd like to contribute to the wiki pagehttp://wiki.apache.org/solr/ScriptUpdateProcessorand add a python example. So, if anyone could allow me write access or tell me how to do this without, I'd be happy to contribute. On Wed, Mar 27, 2013 at 12:38 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : none of the params I specify in solrconfig.xml are being picked up. The : error I'm getting is: NameError: global name 'params' is not defined. ... : updateRequestProcessorChain name=script : processor class=solr.StatelessScriptUpdateProcessorFactory : str name=scriptsummarize.py/str : /processor : !-- optional parameters passed to script -- : lst name=params : str name=from_fieldabstract/str : str name=to_fieldsummary/str : /lst ...that list of params isn't inside the processor tag, so StatelessScriptUpdateProcessorFactory doesn't know anything about it, so it's not passing it to the ScriptEngineManager -Hoss
Re: Disc space and replication
On Mar 27, 2013, at 3:57 AM, Arkadi Colson ark...@smartbit.be wrote: Hi When replication is down for some time or an instance crashed for some reason replication will always start over again from the beginning. This means it will copy the whole shard over of about 150GB. So we need at least a disc of about 300 GB. I've read somewhere that Solr will replicate everything when 100 entries are missing? Why is that? Is it configurable? Not configurable. Are you using 4.2? It will not recopy any segment files that already exist on the replica - 4.0 and 4.1 copied all the files regardless in SolrCloud mode. What about optimization? Is it still needed in SolrCloud? Will it reduce the disc usage? Does it also need twice the shard size to run successful? I wouldn't optimize if you will continue to add/update documents. Use merge policy settings to control the segment count. Is it correct that currently the only option for now the make more shards to reduce the disc space? ?? Is the any progress in the resharding option the developers are working on? Yes, see the JIRA issue on shard splitting. - Mark Thx! -- Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81
Re: Solrcloud 4.1 Collection with multiple slices only use
So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Using multiple text files for Suggestor dictionarys
I'm using the Suggester component for autocomplete. I have a variety of types of suggestions that I would like to offer, such as locations, company names, products, and dictionary words. These lists vary in size and volatility, so keeping them all in the same text file is not the most convenient. I'm using text files because I want the ability to add weights to the terms suggested. Is it possible to use multiple text files? I tried the following: !-- WFSTLookup suggest component -- searchComponent class= solr.SpellCheckComponent name=suggestword lst name=spellchecker str name=namesuggestword/str str name=classname org.apache.solr.spelling.suggest.Suggester/str str name=lookupImpl org.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name= storeDirsuggestword/str str name=buildOnCommitfalse/str !-- Suggester properties -- bool name=exactMatchFirsttrue/bool str name= sourceLocation../data/words.txt/str str name=sourceLocation ../data/cities.txt/str /lst But the second list, the cities, are apparently undetected, after restarting the tomcat and rebuilding the dictionary. Can this be done? If not, how would you recommend managing different dictionaries? Thanks, Eric Wilson
Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely
Hi Nate; This may be out of topic however could you explain that why you want to use Tomcat instead of Jetty or Embedded Jetty? 2013/3/27 Michael Della Bitta michael.della.bi...@appinions.com You're using the blocking IO connector, which isn't so great for heavy loads. Give this a shot... You'll end up with 8192 max connections by default, although this is tunable too: Run: apt-get install libapr1 libtcnative-1 Add this to the list of Listeners at the top of server.xml: Listener className=org.apache.catalina.core.AprLifecycleListener SSLEngine=off / These instructions assume you're running Tomcat 6 or 7. Here's some documentation: http://tomcat.apache.org/tomcat-7.0-doc/apr.html http://tomcat.apache.org/tomcat-7.0-doc/config/http.html Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Mar 26, 2013 at 5:31 PM, Nate Fox n...@neogov.com wrote: We're not using ELB and I have no idea which connector I'm using - I'm guessing whatever is default (I'm a total noob). This is from my server.xml: Connector port=8080 protocol=HTTP/1.1 connectionTimeout=6 URIEncoding=UTF-8 redirectPort=8443 / -- Nate Fox Sr Systems Engineer o: 310.658.5775 m: 714.248.5350 Follow us @NEOGOV http://twitter.com/NEOGOV and on Facebookhttp://www.facebook.com/neogov NEOGOV http://www.neogov.com/ is among the top fastest growing software companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and the LA Business Journal. We are hiring! http://www.neogov.com/#/company/careers On Tue, Mar 26, 2013 at 1:02 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Nate, We just cleared up a problem similar to this by ditching Elastic Load Balancer and switching over to the APR connector in Tomcat. Are you using either of those? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Mar 26, 2013 at 2:58 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Nate, Try adding some warmup queries and making sure the setting for using the cold searcher in solrconfig.xml is set to false. Your warmup queries should use facets and sorting if your normal queries use them. In SPM you'll actually see how much time warming up takes, so you'll get a better idea of the cost of that (when you don't do it). Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Mar 26, 2013 at 2:50 PM, Nate Fox n...@neogov.com wrote: I was wondering if the warmup stuff was one of the culprits (we dont have warmup's at all - the configs are pretty stock). As for the system, it seems capable of quite a bit more: memory usage is ~30%, jvm-memory (from the dashboard) is very low (~220Mb out of 3Gb) and load below 1.00. The seed data and queries were put together by one of our developers. I've put all the solrmeter files here: https://gist.github.com/natefox/ee5cef3d4fbbc73e9bce Unfortunately I'm quite new to solr (and tomcat) so I'm not entirely sure which file does which specifically. Does the system's reaction to a 'fast load' without a warmup sound normal? I would have expected the first couple hundred queries to be very slow (500ms) and then the system catch up after a while. But it just dies very quickly and never recovers. I'll check out your SPM - I've seen it mentioned before. Thanks! -- Nate Fox Sr Systems Engineer o: 310.658.5775 m: 714.248.5350 Follow us @NEOGOV http://twitter.com/NEOGOV and on Facebookhttp://www.facebook.com/neogov NEOGOV http://www.neogov.com/ is among the top fastest growing software companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and the LA Business Journal. We are hiring! http://www.neogov.com/#/company/careers On Tue, Mar 26, 2013 at 11:12 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, In short, certain data structures need to load from index in the beginning, (for sorting and faceting) caches need to warm up, JVM needs to warm up, etc., so going slowly in the beginning makes sense. Why things die after that is a different Q. Maybe it OOMs? Maybe queries are very complex? What do your queries look like? I see newrelic.jar in the command-line. May want to try SPM for Solr, it has better Solr metrics. Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Mar 26, 2013 at 1:24 PM, Nate Fox n...@neogov.com wrote: I'm new to solr and I'm load testing our setup to see what we can handle.
How do I recover the position and offset a highlight for solr (4.1/4.2)?
Hi, I would like to retrieve the position and offset of each highlighting found. I searched on the internet, but I have not found the exact solution to my problem...
Re: Elasticsearch with kerberos
On 3/27/2013 5:29 AM, Debika Mukherjee wrote: Is there any integration of Solr with Kerberos? I am pretty sure that the answer is no. Solr has no security features at all - it is intended to live where regular users cannot get to it. Thanks, Shawn
Querying a transitive closure?
This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Elasticsearch with kerberos
Debika, Did you really mean to ask about Solr or ElasticSearch (see subject)? I think your best bet is ManifoldCF, where I see some mention of it http://search-lucene.com/?q=kerberos Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 11:55 AM, Shawn Heisey s...@elyograg.org wrote: On 3/27/2013 5:29 AM, Debika Mukherjee wrote: Is there any integration of Solr with Kerberos? I am pretty sure that the answer is no. Solr has no security features at all - it is intended to live where regular users cannot get to it. Thanks, Shawn
Re: Querying a transitive closure?
Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Elasticsearch with kerberos
: Is there any integration of Solr with Kerberos? : I am pretty sure that the answer is no. Solr has no security features at : all - it is intended to live where regular users cannot get to it. The key question is how you define integration of Solr with Kerberos ? what is your goal? How is it you want Kerberos to be used? Because Solr is webapp that can run in any servlet container, you may be able to achieve your goals by using a servlet container that already supports kerberos (ie: if your goal is to use kerberose authentication of clients talking to Solr) But w/o more details as to what it is you actually car about, there's no real way to give you a meaningful answer other then to say nothing in Solr requires or directly knows about kerberose authentication. -Hoss
Solr Cloud update process
What do people do for updating, say from 4.1 to 4.2.1, on a live cluster? I need to help our release engineering team create the Jenkins scripts for deployment. wunder -- Walter Underwood wun...@wunderwood.org
Solr 4.1 SolrCloud with 1 shard and 3 replicas
I am running Solr 4.1. I have set up SolrCloud with 1 leader and 3 replicas, 4 nodes total. Do query requests send to a node only query the replica on that node, or are they load-balanced to the entire cluster? Bill
Re: Solr Cloud update process
On 3/27/2013 12:34 PM, Walter Underwood wrote: What do people do for updating, say from 4.1 to 4.2.1, on a live cluster? I need to help our release engineering team create the Jenkins scripts for deployment. Aside from replacing the .war file and restarting your container, there hopefully won't be anything additional required. The subject says SolrCloud, so your config(s) should be in zookeeper. It would generally be a good idea to update luceneMatchVersion to LUCENE_42 in the config(s), unless you happen to know that you're relying on behavior from the old version that changed in the new version. I also make a point of deleting the old extracted version of the .war before restarting, just to be sure there won't be any problems. In theory a servlet container should be able to handle this without intervention, but I don't like taking the chance. Thanks, Shawn
Re: Querying a transitive closure?
Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely
Update: issue resolved! Cranking up the maxThreads did the trick. Default is 200. I went with 2500 for grins and giggles and things work great. Now, even if I overwhelm the box with too many requests, when the requests back off the box continues to respond. And when I slam the server after it's been restarted (without having warmup queries), it acts as I wanted: queries are slow to respond (upwards of 30s) for the first couple minutes then they start to all be under 25ms and normalize at a very fast pace (obviously as the cache is warmed). Christopher, I could have sworn I tried upping acceptCount, maxConnections and maxThreads in my testing, but with your prodding I tried it again - and that was the solution. I have a couple quick followup questions: - What is the downside of having a maxThreads, acceptCount and maxConnections really high? Obviously defaults are there for a reason - I'd like to know what the reasoning is. - Any reason I shouldnt use Tomcat? I just went with it because I figured it was extremely mature and was easy to use with apt-get :) I'll probably toy with the APR as suggested by Michael, as I like the idea of a non-blocking connector. -- Nate Fox Sr Systems Engineer o: 310.658.5775 m: 714.248.5350 Follow us @NEOGOV http://twitter.com/NEOGOV and on Facebookhttp://www.facebook.com/neogov NEOGOV http://www.neogov.com/ is among the top fastest growing software companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and the LA Business Journal. We are hiring!http://www.neogov.com/#/company/careers On Tue, Mar 26, 2013 at 5:56 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : * When I set solrmeter to run 4000 queries/min, it will handle a few : hundred queries and then tomcat will stop responding completely to requests : (even though according to lsof -i it is still listening and the java : process is still running). have you tried tacking using jstack to generate a thread dump of the server to see what it's doing? : * When I set solrmeter to run 1000 queries/min it runs fine. I can stop : solrmeter after a couple of minutes at that pace and then run at 4000/min : without issue. : : It's as if it needs a ramp up time? Also, I noticed (regardless of ramp up) : that my setup cannot handle 8000/min. The reaction at 8k/min is the same as : if I were to run 4k/min without the ramp up. Of note, only the shard that : solrmeter is pointed to stops responding. The other shard hums along : without incident. Just to clarify: you're running a 2 node SolrCloud cluster, where each node contains a unique shard, and pointing solrmeter at a single node for the queries -- correct? Here's my hunch: you are probably hitting the limit of the number of concurrent connections tomcat will allow (whatever it may be confiurged ot in your setup). In the 8000/min case, you are probably maxing out that limit with direct connections you issue from solrmeter to that single node. In the 4000/min case, each request you issue causes that single node to fire off multiple requests to each shard, and since each shard exists on only one node, you are garunteeing thta you double the number of concurrent requests hitting that first node. in the case where you start w/ 1000/min, and then later ramp up to 4000/min, you are probably causing enough of the queries to be warmed up that they are in the caches on both nodes, so they can be served really fast and return their results before you reach that max number of concurrent connections after you ramp up. I'm no tomcat expert, but skimming hte docs, you may want to look at settings like acceptCount, maxConnections, maxThreads, etc... -Hoss
Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas
They are load-balanced across the cluster unless you pass the distrib=false param. - Mark On Mar 27, 2013, at 2:51 PM, Bill Au bill.w...@gmail.com wrote: I am running Solr 4.1. I have set up SolrCloud with 1 leader and 3 replicas, 4 nodes total. Do query requests send to a node only query the replica on that node, or are they load-balanced to the entire cluster? Bill
Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas
Requests to a node in your example would be answered by that node (no need to distribute; it's a single shard system) and it would not internally be routed otherwise either. Ultimately it is up to the client to load-balance the initial requests into a SolrCloud cluster, but internally in a multi-shard distributed search request it will be load balanced beyond that initial node. CloudSolrServer does load balance, so if you're using that client it'll randomly pick a shard to send to from the client-side. If you're using some other mechanism, it'll request directly to whatever node that you've specified directly for that initial request. Erik p.s. Thanks for attending the webinar, Bill! I saw your name as one of the question askers. Hopefully all that stuff I made up is close to the truth :) On Mar 27, 2013, at 14:51 , Bill Au wrote: I am running Solr 4.1. I have set up SolrCloud with 1 leader and 3 replicas, 4 nodes total. Do query requests send to a node only query the replica on that node, or are they load-balanced to the entire cluster? Bill
Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely
On 3/27/2013 1:16 PM, Nate Fox wrote: I have a couple quick followup questions: - What is the downside of having a maxThreads, acceptCount and maxConnections really high? Obviously defaults are there for a reason - I'd like to know what the reasoning is. - Any reason I shouldnt use Tomcat? I just went with it because I figured it was extremely mature and was easy to use with apt-get :) The maxThreads parameter in the jetty config that's included with Solr is set to 1 - this is the value chosen by Solr's development team. Your setting of 2500 should be perfectly fine, and it is definitely not really high. The default of 200 in your distribution is very low. Tomcat is certainly a viable solution, one used by many. It is very mature and has proven itself. The really nice thing with using an OS-packaged version is that you don't have to write or change the init script. I use the jetty that was included with Solr, and had to write my own init script. Jetty, especially the stripped-down version included with Solr, has a smaller footprint than tomcat. The bells and whistles are not required. It is not better or worse than tomcat, just another choice. Thanks, Shawn
Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely
On Mar 27, 2013, at 3:29 PM, Shawn Heisey s...@elyograg.org wrote: The maxThreads parameter in the jetty config that's included with Solr is set to 1 Yonik raised this at some point if I remember right - it helps avoid some distrib deadlock issue. - Mark
Warming queries and Solr Cloud - just curious ...
When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Warming queries and Solr Cloud - just curious ...
Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Warming queries and Solr Cloud - just curious ...
Ok - thanks for confirming Mark - I'll add that to the wiki. Cheers, Tim On Wed, Mar 27, 2013 at 1:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Warming queries and Solr Cloud - just curious ...
This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas
Thanks for the info, Erik. I had gone through the tutorial in the SolrCloud Wiki and verified that queries are load balanced in the two shard cluster with shard replicas setup. I was wondering if I need to explicitly specify distrib=false in my single shard setup. Glad to see that Solr is doing the right thing by default in my case. Bill ps thanks for a very informative webinar. I am going to recommend it to my co-workers once the recording is available On Wed, Mar 27, 2013 at 3:26 PM, Erik Hatcher erik.hatc...@gmail.comwrote: Requests to a node in your example would be answered by that node (no need to distribute; it's a single shard system) and it would not internally be routed otherwise either. Ultimately it is up to the client to load-balance the initial requests into a SolrCloud cluster, but internally in a multi-shard distributed search request it will be load balanced beyond that initial node. CloudSolrServer does load balance, so if you're using that client it'll randomly pick a shard to send to from the client-side. If you're using some other mechanism, it'll request directly to whatever node that you've specified directly for that initial request. Erik p.s. Thanks for attending the webinar, Bill! I saw your name as one of the question askers. Hopefully all that stuff I made up is close to the truth :) On Mar 27, 2013, at 14:51 , Bill Au wrote: I am running Solr 4.1. I have set up SolrCloud with 1 leader and 3 replicas, 4 nodes total. Do query requests send to a node only query the replica on that node, or are they load-balanced to the entire cluster? Bill
Query on all dynamic fields or wildcard field query
Hi All, First I have to apologize and admit that I'm asking this question before doing any real research =( Was hoping for some preliminary help before I start this endeavor tomorrow. So here goes: Can I query for a value in multiple (wildcarded) fields? For example, if I have dynamic fields fieldName_someToken (e.g. fieldName_1, fieldName_2, fieldName_3), can I construct a query like fieldName_*:someValue? The query itself doesn't work, but is there a way to query numerous dynamic fields without explicitly listing them? Thanks, Luis
Re: Warming queries and Solr Cloud - just curious ...
In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Warming queries and Solr Cloud - just curious ...
This jira looks like it addresses this. https://issues.apache.org/jira/browse/SOLR-3081 I'll run a quick test. On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote: In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim -- Joel Bernstein Professional Services LucidWorks
Re: Warming queries and Solr Cloud - just curious ...
I ran a quick test and distrib=false is being tacked on automatically. Here is the log record: INFO: [collection1] webapp=null path=null params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1 status=0 QTime=17 So I think this is OK. On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote: This jira looks like it addresses this. https://issues.apache.org/jira/browse/SOLR-3081 I'll run a quick test. On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote: In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Warming queries and Solr Cloud - just curious ...
Ah, interesting. Forgot about doing that issue entirely. - Mark On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote: I ran a quick test and distrib=false is being tacked on automatically. Here is the log record: INFO: [collection1] webapp=null path=null params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1 status=0 QTime=17 So I think this is OK. On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote: This jira looks like it addresses this. https://issues.apache.org/jira/browse/SOLR-3081 I'll run a quick test. On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote: In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Warming queries and Solr Cloud - just curious ...
lol - you know you're a bad ass when you've forgotten more about Solr cloud than the rest of us know ;-) On Wed, Mar 27, 2013 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote: Ah, interesting. Forgot about doing that issue entirely. - Mark On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote: I ran a quick test and distrib=false is being tacked on automatically. Here is the log record: INFO: [collection1] webapp=null path=null params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1 status=0 QTime=17 So I think this is OK. On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote: This jira looks like it addresses this. https://issues.apache.org/jira/browse/SOLR-3081 I'll run a quick test. On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.com wrote: In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Warming queries and Solr Cloud - just curious ...
That was a good fix Mark. I had this picture in my head of a large Solr Cloud sending around thousands of simultaneous searches and crashing itself. On Wed, Mar 27, 2013 at 6:47 PM, Timothy Potter thelabd...@gmail.comwrote: lol - you know you're a bad ass when you've forgotten more about Solr cloud than the rest of us know ;-) On Wed, Mar 27, 2013 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote: Ah, interesting. Forgot about doing that issue entirely. - Mark On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote: I ran a quick test and distrib=false is being tacked on automatically. Here is the log record: INFO: [collection1] webapp=null path=null params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1 status=0 QTime=17 So I think this is OK. On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote: This jira looks like it addresses this. https://issues.apache.org/jira/browse/SOLR-3081 I'll run a quick test. On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.com wrote: In our case, yes - same non-distrib query is warmed on each node. Seems like you'd need something a little more dynamic than statically configured warming queries in solrconfig.xml for targeting specfic shards. Tim On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote: This is interesting. I'm looking into doing something similar too. Quick question: Would you be targeting each of the shard with exactly the same set of queries? On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks -- Joel Bernstein Professional Services LucidWorks
Re: Query on all dynamic fields or wildcard field query
No, but you can use the dismax feature of the dismax and edismax query parsers to specify a static list of any number of fields to be searched for terms in a query that do not have an explicit field specified. And, no harm filing a Jira to request support for a wildcard field search feature. -- Jack Krupansky -Original Message- From: Luis Lebolo Sent: Wednesday, March 27, 2013 5:08 PM To: solr-user Subject: Query on all dynamic fields or wildcard field query Hi All, First I have to apologize and admit that I'm asking this question before doing any real research =( Was hoping for some preliminary help before I start this endeavor tomorrow. So here goes: Can I query for a value in multiple (wildcarded) fields? For example, if I have dynamic fields fieldName_someToken (e.g. fieldName_1, fieldName_2, fieldName_3), can I construct a query like fieldName_*:someValue? The query itself doesn't work, but is there a way to query numerous dynamic fields without explicitly listing them? Thanks, Luis
Re: Solr index Backup and restore of large indexs
Hi, Are you running Solr Cloud or Master/Slave? I'm assuming with 1TB a day you're sharding. With master/slave you can configure incremental index replication to another core. The backup core can be local on the server, on a separate sever or in a separate data center. With Solr Cloud replicas can be setup to automatically have redundant copies of the index. These copies though are live copies and will handle queries. Replicating data to a separate data center is typically not done through Solr Cloud replication. Joel On Mon, Mar 25, 2013 at 11:43 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Try something like this: http://host/solr/replication?command=backup See: http://wiki.apache.org/solr/SolrReplication Otis -- Solr ElasticSearch Support http://sematext.com/ On Thu, Mar 21, 2013 at 3:23 AM, Sandeep Kumar Anumalla sanuma...@etisalat.ae wrote: Hi, We are loading daily 1TB (Apprx) of index data .Please let me know the best procedure to take Backup and restore of the indexes. I am using Solr 4.2. Thanks Regards Sandeep A Ext : 02618-2856 M : 0502493820 The content of this email together with any attachments, statements and opinions expressed herein contains information that is private and confidential are intended for the named addressee(s) only. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. If you have received this message in error please notify postmas...@etisalat.ae by email immediately and delete the message without making any copies. -- Joel Bernstein Professional Services LucidWorks
Solr sorting and relevance
We are using solr for search on our ecommerce site that primarily sells clothing. We index search terms based on a title field and description field. We want to be able to sort by most relevant and what we have more inventory (there is a field for that). We have done some coding outside of Solr to try and achieve this but it causes the following problem. Let's take jeans and boots as an example. A customer might search on boots and solr returns a bunch of boots and jeans. The jeans are included because the description might contain some data like pant legs fits easily over boots. Now if we have more inventory in the particular jeans than the boots solr returned, the user will get back a list that shows mostly jeans at top and then somewhere down the list boots will show up. There isn't a problem with the jeans showing up but the boots should actually be displayed first with the ones having the most inventory then the jeans can be somewhere at the bottom of the list. I want to eliminate the hacks that have been done to try to incorporate inventory, i.e. have solr return the results and not manipulate it in code. I hope I have explained the problem enough for you to get the gist of what I am trying to accomplish. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud 4.1 Collection with multiple slices only use
First, three documents isn't enough to really test. The formula for assigning shards is to hash on the unique ID. It _is_ possible that all three just happened to land on the same shard. If you index all 32 docs in the example dir and they're all on the same shard, we should talk. Second, a regular query to the cluster will always search all the shards. Use distrib=false on the URL to restrict the search to just the node you fire the request at. Let us know if you index more docs and still see the problem. Best Erick On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote: So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Re: Warming queries and Solr Cloud - just curious ...
Tim: Unfortunately, due to the increase in spam pages from bots, we had to lock down the Solr wiki. Post a request for us to add your Wiki ID (and give us the ID!) to the list of authorized IDs and we'll get you added (just takes a second). Or send me (or Steve Rowe) a private e-mail if you'd prefer. Best Erick On Wed, Mar 27, 2013 at 4:03 PM, Timothy Potter thelabd...@gmail.com wrote: Ok - thanks for confirming Mark - I'll add that to the wiki. Cheers, Tim On Wed, Mar 27, 2013 at 1:59 PM, Mark Miller markrmil...@gmail.com wrote: Yup. You only want to warm locally. We should add that to the wiki. - Mark On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote: When running in SolrCloud mode, does it make sense to disable distributed mode for warming queries? i.e. distrib=false in my warming query config I actually asked this on Erik's informative Webinar this morning but had to drop off before I heard the answer ... so Erik might have answered this already ;-) My thinking here is that a hard commit gets sent around the cluster automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard commit, all 36 nodes will be warming up. If my warming queries are distributed, then all nodes are going to be sending the same query needlessly around the cluster 36 times - seems unnecessary. Thoughts? Cheers, Tim
Re: Solrcloud 4.1 Collection with multiple slices only use
I realized my error shortly, more docs, better spread. I continued to do some testing to see how I could manually lay out the shards in what I thought was a more organized manner and with more descriptive names than the numshards parameter alone produced. I also gen'd up a few thousand docs and schema to test with. Appreciate the help. - Reply message - From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Subject: Solrcloud 4.1 Collection with multiple slices only use Date: Wed, Mar 27, 2013 9:30 pm First, three documents isn't enough to really test. The formula for assigning shards is to hash on the unique ID. It _is_ possible that all three just happened to land on the same shard. If you index all 32 docs in the example dir and they're all on the same shard, we should talk. Second, a regular query to the cluster will always search all the shards. Use distrib=false on the URL to restrict the search to just the node you fire the request at. Let us know if you index more docs and still see the problem. Best Erick On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote: So - I must be missing something very basic here and I've gone back to the Wiki example. After setting up the two shard example in the first tutorial and indexing the three example documents, look at the shards in the Admin UI. The documents are stored in the index where the update with directed - they aren't distributed across both shards. Release notes state that the compositeId router is the default when using the numshards parameter? I want an even distribution of documents based on ID across all shards suggestions on what I'm screwing up. Chris On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote: I'm guessing you didn't specify numShards. Things changed in 4.1 - if you don't specify numShards it goes into a mode where it's up to you to distribute updates. - Mark On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote: I have two issues and I'm unsure if they are related: Problem: After setting up a multiple collection Solrcloud 4.1 instance on seven servers, when I index the documents they aren't distributed across the index slices. It feels as though, I don't actually have a cloud implementation, yet everything I see in the admin interface and zookeeper implies I do. I feel as I'm overlooking something obvious, but have not been able to figure out what. Configuration: Seven servers and four collections, each with 12 slices (no replica shards yet). Zookeeper configured in a three node ensemble. When I send documents to Server1/Collection1 (which holds two slices of collection1), all the documents show up in a single index shard (core). Perhaps related, I have found it impossible to get Solr to recognize the server names with anything but a literal host=servername parameter in the solr.xml. hostname parameters, host files, network, dns, are all configured correctly I have a Solr 4.0 single collection set up similarly and it works just fine. I'm using the same schema.xml and solrconfig.xml files on the 4.1 implementation with only the luceneMatchVersion changed to LUCENE_41. sample solr.xml from server1 ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8080 host=server1 shareSchema=true zkClientTimeout=6 core collection=col201301 shard=col201301s04 instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01 dataDir=/solr/col201301/col201301s04sh01/data/ core collection=col201301 shard=col201301s11 instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01 dataDir=/solr/col201301/col201301s11sh01/data/ core collection=col201302 shard=col201302s06 instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01 dataDir=/solr/col201302/col201302s06sh01/data/ core collection=col201303 shard=col201303s01 instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01 dataDir=/solr/col201303/col201303s01sh01/data/ core collection=col201303 shard=col201303s08 instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01 dataDir=/solr/col201303/col201303s08sh01/data/ core collection=col201304 shard=col201304s03 instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01 dataDir=/solr/col201304/col201304s03sh01/data/ core collection=col201304 shard=col201304s10 instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01 dataDir=/solr/col201304/col201304s10sh01/data/ /cores /solr Thanks Chris
Re: Querying a transitive closure?
Hi Jack, I don't fully understand the exact taxonomy structure and your needs, but in terms of reducing the number of HTTP round trips, you can do it by writing a custom SearchComponent that, upon getting the initial request, does everything locally, meaning that it talks to the local/specified shard before returning to the caller. In SolrCloud setup with N shards, each of these N shards could be queried in such a way in parallel, running query/queries on their local shards. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote: Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Querying a transitive closure?
Hi Otis, That's essentially the answer I was looking for: each shard (are we talking master + replicas?) has the plug-in custom query handler. I need to build it to find out. What I mean is that there is a taxonomy, say one with a single root for sake of illustration, which grows all the classes, subclasses, and instances. If I have an object that is somewhere in that taxonomy, then it has a zigzag chain of parents up that tree (I've seen that called a transitive closure. If class B is way up that tree from M, no telling how many queries it will take to find it. Hmmm... recursive ascent, I suppose. Many thanks Jack On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, I don't fully understand the exact taxonomy structure and your needs, but in terms of reducing the number of HTTP round trips, you can do it by writing a custom SearchComponent that, upon getting the initial request, does everything locally, meaning that it talks to the local/specified shard before returning to the caller. In SolrCloud setup with N shards, each of these N shards could be queried in such a way in parallel, running query/queries on their local shards. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote: Hi Otis, I fully expect to grow to SolrCloud -- many shards. For now, it's solo. But, my thinking relates to cloud. I look for ways to reduce the number of HTTP round trips through SolrJ. Maybe you have some ideas? Thanks Jack On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Jack, Is this really about HTTP and Solr vs. SolrCloud or more whether Solr(Cloud) is the right tool for the job and if so how to structure the schema and queries to make such lookups efficient? Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote: This is a question about isA? We want to know if M isA B isA?(M,B) For some M, one might be able to look into M to see its type or which class(es) for which it is a subClass. We're talking taxonomic queries now. But, for some M, one might need to ripple up the transitive closure, looking at all the super classes, etc, recursively. It seems unreasonable to do that over HTTP; it seems more reasonable to grab a core and write a custom isA query handler. But, how do you do that in a SolrCloud? Really curious... Many thanks in advance for ideas. Jack
Re: Solr sorting and relevance
It sounds like you might be able to get the mix you want with three different boosts: 1) High boost on title 2) Lower boost on description 3) Function query boost on inventory The high boost on title will help push products with matches in the title to the top. The function query boost on inventory will help move higher inventory to the top. You can also use the QueryElevationComponent to move specific docs to the top for specific queries but this might not be effective for your use case. There is also a patch (SOLR-4465) which is experimental at this point but is designed for people to move custom sort algorithms into Solr through custom collectors. This is an advanced approach and would take a strong understanding Lucene collectors. On Wed, Mar 27, 2013 at 9:02 PM, scallawa dami...@altrec.com wrote: We are using solr for search on our ecommerce site that primarily sells clothing. We index search terms based on a title field and description field. We want to be able to sort by most relevant and what we have more inventory (there is a field for that). We have done some coding outside of Solr to try and achieve this but it causes the following problem. Let's take jeans and boots as an example. A customer might search on boots and solr returns a bunch of boots and jeans. The jeans are included because the description might contain some data like pant legs fits easily over boots. Now if we have more inventory in the particular jeans than the boots solr returned, the user will get back a list that shows mostly jeans at top and then somewhere down the list boots will show up. There isn't a problem with the jeans showing up but the boots should actually be displayed first with the ones having the most inventory then the jeans can be somewhere at the bottom of the list. I want to eliminate the hacks that have been done to try to incorporate inventory, i.e. have solr return the results and not manipulate it in code. I hope I have explained the problem enough for you to get the gist of what I am trying to accomplish. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918.html Sent from the Solr - User mailing list archive at Nabble.com. -- Joel Bernstein Professional Services LucidWorks
Re: Could not load config for solrconfig.xml
Hi Hoss, Thank you for replying to my question, The solrconfig.xml in the example-DIH in solr download is exactly the same like the links you posted in your reply, so where is the big difference ? I think I typed a mistake in my last question, instead of saying db-data-config.xml I said solrconfig.xml. but still did not understand where that exception come from. Your helps will be appreciated. Abdel. From: Chris Hostetter hossman_luc...@fucit.org To: gene...@lucene.apache.org gene...@lucene.apache.org; A. Lotfi majidna...@yahoo.com Sent: Wednesday, March 27, 2013 6:00 PM Subject: Re: Could not load config for solrconfig.xml 1) the email list you want to be using is solr-user@lucene, not general@lucene 2) there is a big differnece between solrconfig.xml (which controls in general how solr works for managing a SolrCore); and the config files for DIH (which can be used to tell Solr where/how to fetch data to index) typically called data-config.xml (but you can name them anything you want). what you have described below is a data config file for DIH, if you are trying to use it as a solrconfig.xml file you aren't going to get very far. I suggest you take a gandar at the example config set for using DIH with a database... https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/ https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/ https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/solrconfig.xml?view=markup https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/db-data-config.xml?view=log ...and keep them in mind while reviewing the DIH docs... http://wiki.apache.org/solr/DataImportHandler -Hoss
Could not load config for solrconfig.xml
Hi, I am trying solr with an oracle database, It's working but I have on the top of the page an exception : SolrCore Initialization Failures solr: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load config for solrconfig.xml Here is my db-data-config.xml : dataConfig dataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@ourIPaddress:1521:ourDB user=username password=password/ document entity name=residential query=select * from tsunami.consumer_data_01 where state='MA' deltaQuery=select LEMSMATCHCODE, STREETNAME from residential where last_modified '${dataimporter.last_index_time}' field column=LEMSMATCHCODE name=lemsmatchcode / field column=STREETNAME name=streetname / /entity /document /dataConfig Thanks, your help is appreciated.
Re: Too many fields to Sort in Solr
Hi Joel, you are correct, boost function populates the field cache. Well i am not aware of docValue, so while trying the example you provided i see the error when i define the field type Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is configured with a docValues format, but the codec does not support it: class org.apache.solr.core.SolrCore$3 at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854) at org.apache.solr.core.SolrCore.init(SolrCore.java:719) ... 13 more My field defination: fieldType name=dvLong class=solr.TrieLongField precisionStep=0 positionIncrementGap=0 docValuesFormat=Disk/ what am i missing here? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4051960.html Sent from the Solr - User mailing list archive at Nabble.com.