Re: best load balancer for solr cloud
Thanks Shawn, Amey, Any specific configuration needed for CloudSolrServer as I've seen increased latency on using it. Does ConcurrentUpdateSolrServer itself do discovery like CloudSolrServer. On Mon, Oct 13, 2014 at 7:53 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/13/2014 5:28 AM, Apoorva Gaurav wrote: Is it preferable to use CloudSolrServer or using an external load balancer like haproxy. We're currently channeling all our requests via haproxy but want to get rid of management hassles as well as additional network call but saw a significant degradation in latency on switching to CloudSolrServer. Please suggest. If your client is Java, then there's no contest. Use CloudSolrServer. It can react almost instantly to cluster changes. A load balancer will need to do a health-check cycle before it knows about machines coming up or going down. The other reply that you received mentioned ConcurrentUpdateSolrServer. This is a high-performance option, but it comes at a cost -- your application will never be informed about any indexing errors. Even if the index requests all fail, your application will never know. Thanks, Shawn -- Thanks Regards, Apoorva
best load balancer for solr cloud
Hello All, Is it preferable to use CloudSolrServer or using an external load balancer like haproxy. We're currently channeling all our requests via haproxy but want to get rid of management hassles as well as additional network call but saw a significant degradation in latency on switching to CloudSolrServer. Please suggest. -- Thanks Regards, Apoorva
Disable caching in sort
Hello All, We are trying to provide a personalized sort order for each user. We've a per-computed list of user to products and if it matches the solr result set those products need to be shown upfront. One way can be handling this in application but pagination becomes tricky. Another way we are exploring this is via a custom value source where we'll pass productid to custom-score and sort based on this. We've been able to manipulate result set using this, but sort order is getting cached. One way can be using {!cache=false} but that would lead to performance degradation. Any other way of achieving this? -- Thanks Regards, Apoorva
Re: Help on custom sort
Try using a custom value source parser and pass the formula of computing the price to solr; something like this http://java.dzone.com/articles/connecting-redis-solr-boosting On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith ssm...@mainstreamdata.com wrote: There are likely several hundred groups. Also, new groups will be added and some groups will be deleted. So, I don't think putting a field in the docs works. Having to add a new group price into 100 million+ documents doesn't seem reasonable. Right now I'm looking at http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html. This reference a much older version of solr (the blog is from 2011) and so I will need to update the classes referenced. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, September 20, 2014 11:58 AM To: solr-user@lucene.apache.org Subject: Re: Help on custom sort How many different groups are there? And can user A ever be part of more than one group? If 1 there are a reasonably small number of groups ( 100 or so as a place to start) and 2 a user is always part of a single group then you could store separate prices in each document by group, thus you'd have some fields like price_group_a: $100 price_group_b: $101 then sorting becomes trivial, you just specify a sort_group_a for users in group A etc. If the number of groups is unknown-but-not-huge dynamic fields could be used. If that's not the case, then you might be able to get clever with sorting by function, here's a place to start: https://cwiki.apache.org/confluence/display/solr/Function+Queries These can be arbitrarily complex, but I'm thinking something where the price returned by the function respects the group the user is in, perhaps even the min/max of all the groups the user is in. I admit I haven't really thought that through well though... Best, Erick On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith ssm...@mainstreamdata.com wrote: I need to provide a custom sort option for sorting by price and I would like some suggestions. It's not the straightforward just sort by a price field in the document scenario or I wouldn't be asking for help. Here's the scenario I'm dealing with. I have 100 million+ documents (so multi-sharded). Users search for documents they are interested in using a standard keyword search. They then purchase documents they are interested in. So far, nothing hard. Here's where things get interesting. The documents come from multiple suppliers. Each supplier sets a price for his documents and different suppliers will provide different pricing. That wouldn't be difficult except that *users* are divided up into different groups and depending on which group they are in, the supplier will charge the user a different price. So, user A may pay one price for a document and user B may pay a different price for the same document just because user A and user B are in different groups. I don't even know if the relative order or pricing is the same between different groups (e.g., if document X is more expensive than document Y for a user in group M, it may not be more expensive for a user in group N). The one thing that may make this doable is that supplier A will likely have the same price for all of his documents for each of the user groups. So, a user in group A will pay the same price regardless of which document he buys from supplier 1. A user in group B will also pay the same price for any document from supplier 1; it's just that a user in group B will likely pay a different price than a user in group A. So, within a supplier, the price varies based on user group, not the document. To summarize, one of the requirements for the system is that we provide the ability to sort search results based on price. This would be easy except that the price a user pays not only depends on what he wants to buy, but on what group the he is in. I suspect there is some kind of custom solr module I'm going to have to write. I'm thinking that the user group gets passed in as a custom solr parameter (I'm assuming that's possible??). Then I'm thinking that there has to be some kind of in memory database that tracks pricing based on user group and document supplier). I'm happy to go read code, documents, links, etc if someone can point me in the right direction. What kind of solr module am I likely going to write (extend) and are there some examples somewhere? Maybe there's a way to do this without having to extend a solr module?? Hope this makes sense. Any help is appreciated. Scott -- Thanks Regards, Apoorva
Re: wrong docFreq while executing query based on uniqueKey-field
I faced the same issue sometime back, root cause is docs getting deleted and created again without getting optimized. Here is the discussion http://www.signaldump.org/solr/qpod/22731/docfreq-coming-to-be-more-than-1-for-unique-id-field On Tue, Jul 22, 2014 at 4:56 PM, Johannes Siegert johannes.sieg...@marktjagd.de wrote: Hi. My solr-index (version=4.7.2.) has an id-field: field name=id type=string indexed=true stored=true/ ... uniqueKeyid/uniqueKey The index will be updated once per hour. I use the following query to retrieve some documents: q=id:2^2 id:1^1 I would expect that the document(2) should be always before the document(1). But after many index updates document(1) is before document(2). With debug=true I could see the problem. The document(1) has a docFreq=2, while the document(2) has a docFreq=1. How could the docFreq of the uniqueKey-field be hight than 1? Could anyone explain this behavior to me? Thanks! Johannes -- Thanks Regards, Apoorva
Re: External File Field eating memory
Thanks Kamal. On Wed, Jul 16, 2014 at 11:43 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Apporva, This was my master server replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler It is only configuration files that can be replicated. So, when I wrote the above config. The external files was getting replicated in core/conf/data/external_eff_views. But for solr to read the external file, it looks for it into core/data/external_eff_views location. So firstly the file was not getting replicated properly. Therefore, I did not opted the option of replicating the eff file. And the second thing is that whenever there is a change in configuration files, the core gets reloaded by itself to reflect the changes. I am not sure if you can disable this reloading. Finally, I thought of creating files on slaves in a different way. Thanks Kamal On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hey Kamal, What all config changes have you done to establish replication of external files and how have you disabled role reloading? On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi All, It was found that external file, which was getting replicated after every 10 minutes was reloading the core as well. This was increasing the query time. Thanks Kamal Kishore On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: With the above replication configuration, the eff file is getting replicated at core/conf/data/external_eff_views (new dir data is being created in conf dir) location, but it is not getting replicated at core/data/external_eff_views on slave. Please help. On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Thanks for your guidance Alexandre Rafalovitch. I am looking into this seriously. Another question is that I facing error in replication of eff file This is master replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler The eff file is present at core/data/external_eff_views location. On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This might be related: https://issues.apache.org/jira/browse/SOLR-3514 On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore -- Regards, Shalin Shekhar Mangar. -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: Sort not working in solr
In fact its better using TrieIntField instead of IntField. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3ccab_8yd9yp259kk4ciybbprjcpwqp6vd7yvrtjr1eubew_ky...@mail.gmail.com%3E http://stackoverflow.com/questions/13372323/what-is-the-correct-solr-fieldtype-to-use-for-sorting-integer-values On Tue, Jul 15, 2014 at 1:09 PM, スガヌマヨシカズ suganoo2...@gmail.com wrote: i think type=text_general make it charactered-sort for numbers. How about make it as type=int or type=long instead of text_general? field name=business_point type=text_general indexed=true stored=true required=false multiValued=false/ Regards, suganuma 2014-07-15 16:24 GMT+09:00 madhav bahuguna madhav.bahug...@gmail.com: Iam trying to sort my records but the result i get is not correct My url query-- http://localhost:8983/solr/select/?q=*:*fl=business_pointsort=business_point+desc Iam trying to sort my records by business_points but the result i get is in like this 9 8 7 6 5 45 4 4 10 1 Whys am i getting my results in the wrong order my schema looks like this field name=business_point type=text_general indexed=true stored=true required=false multiValued=false/ -- Regards Madhav Bahuguna -- Thanks Regards, Apoorva
Re: External File Field eating memory
Hey Kamal, What all config changes have you done to establish replication of external files and how have you disabled role reloading? On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi All, It was found that external file, which was getting replicated after every 10 minutes was reloading the core as well. This was increasing the query time. Thanks Kamal Kishore On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: With the above replication configuration, the eff file is getting replicated at core/conf/data/external_eff_views (new dir data is being created in conf dir) location, but it is not getting replicated at core/data/external_eff_views on slave. Please help. On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Thanks for your guidance Alexandre Rafalovitch. I am looking into this seriously. Another question is that I facing error in replication of eff file This is master replication configuration: core/conf/solrconfig.xml requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFiles../data/external_eff_views/str /lst /requestHandler The eff file is present at core/data/external_eff_views location. On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: This might be related: https://issues.apache.org/jira/browse/SOLR-3514 On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore -- Regards, Shalin Shekhar Mangar. -- Thanks Regards, Apoorva
Re: docFreq coming to be more than 1 for unique id field
Hello Markus, Ahmet, Forgot to update the thread; optimization works i.e. after optimizing all unique keys have docFreq as 1. On Wed, Jun 18, 2014 at 1:58 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : text in it, query is of the type keywords:(word1 OR word2 ... OR wordN). : The client is relying on default relevancy based sort returned by solr. : Some documents can get penalised because of some other documents which were : deleted. Is this functionality correct? yes, because term stats are over the entire index including deleted documents still in segments -- information about deletions isn't purged from the index until a segment is merged and the stats are recomputed over the docs/terms in the new segment. the only way to get those types of statistics at request time such that they were *not* afected by deleted documents would involve scanning every doc to compute them -- which would defeat the point of having the inverted index. -Hoss http://www.lucidworks.com/ -- Thanks Regards, Apoorva
docFreq coming to be more than 1 for unique id field
Hello All, We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need to extract docs in a pre-defined order if they match a certain condition. Our query is of the format uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN) where weight1 weight2 weightN But the result is not in the desired order. On debugging the query we've found out that for some of the documents docFreq is higher than 1 and hence their tf-idf based score is less than others. What can be the reason behind a unique id field having docFreq greater than 1? How can we prevent it? -- Thanks Regards, Apoorva
Re: docFreq coming to be more than 1 for unique id field
Yes we have updates on these. Didn't try optimizing will do. But isn't the unique field supposed to be unique? On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Just a guess, do you have deletions? What happens when you optimize and re-try? On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need to extract docs in a pre-defined order if they match a certain condition. Our query is of the format uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN) where weight1 weight2 weightN But the result is not in the desired order. On debugging the query we've found out that for some of the documents docFreq is higher than 1 and hence their tf-idf based score is less than others. What can be the reason behind a unique id field having docFreq greater than 1? How can we prevent it? -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: docFreq coming to be more than 1 for unique id field
Will try optimizing and then respond to the thread. On Tue, Jun 17, 2014 at 8:47 PM, Markus Jelsma markus.jel...@openindex.io wrote: Yes, it is unique but they are not immediately purged, only when `optimized` or forceMerge or during regular segment merges. The problem is that they keep messing with the statistics. -Original message- From:Apoorva Gaurav apoorva.gau...@myntra.com Sent: Tuesday 17th June 2014 17:16 To: solr-user solr-user@lucene.apache.org; Ahmet Arslan iori...@yahoo.com Subject: Re: docFreq coming to be more than 1 for unique id field Yes we have updates on these. Didn't try optimizing will do. But isn't the unique field supposed to be unique? On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi, Just a guess, do you have deletions? What happens when you optimize and re-try? On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav apoorva.gau...@myntra.com wrote: Hello All, We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need to extract docs in a pre-defined order if they match a certain condition. Our query is of the format uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN) where weight1 weight2 weightN But the result is not in the desired order. On debugging the query we've found out that for some of the documents docFreq is higher than 1 and hence their tf-idf based score is less than others. What can be the reason behind a unique id field having docFreq greater than 1? How can we prevent it? -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva -- Thanks Regards, Apoorva
Re: docFreq coming to be more than 1 for unique id field
Currently we are not using SolrJ but are simply interacting with solr with json over http, this will change in a couple of months but currently not there. As of now we are putting all the logic in query building, using it to query solr and then passing on the json returned by it to front end. I know this is not the ideal approach, but that's what we have at the moment. Hence need a way of deterministically order the result set provided they match other search criteria. On Tue, Jun 17, 2014 at 10:28 PM, Chris Hostetter hossman_luc...@fucit.org wrote: All index wide statistics (like the docFreq of each term) are over the entire index, which includes deleted docs -- because it's an *inverted* index, it's not feasible to update those statistics to account for deleted docs (that would basically kill all the performance advantages thatcome from having an inverted index. : uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN) : where weight1 weight2 weightN : : But the result is not in the desired order. On debugging the query we've if you are requesting a small number of docs, and all the docs you are requesting are returned in a single request, why do you care what order they are in? why not just put them in hte order you want on the client. That would not only make your solr request simpler, but would almost certainly be a bit *faster* since you could sort exactly as you wnated w/o needing to compute a complex score that you don't actaully care about. -Hoss http://www.lucidworks.com/ -- Thanks Regards, Apoorva
Re: docFreq coming to be more than 1 for unique id field
OK lets for a moment forget about this specific use case and consider a more general case. Lets say the field name is keywords are we are storing text in it, query is of the type keywords:(word1 OR word2 ... OR wordN). The client is relying on default relevancy based sort returned by solr. Some documents can get penalised because of some other documents which were deleted. Is this functionality correct? On Wed, Jun 18, 2014 at 12:52 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Currently we are not using SolrJ but are simply interacting with solr with : json over http, this will change in a couple of months but currently not : there. As of now we are putting all the logic in query building, using it : to query solr and then passing on the json returned by it to front end. I : know this is not the ideal approach, but that's what we have at the moment. : Hence need a way of deterministically order the result set provided they : match other search criteria. wether you are using SOlrJ or not doesn't really change my point at all -- you are jumping though all sorts of hoops, and asking solr to jump through all sorts of hoops, for a score you don't actaully care about, and isn't going ot work perfectly for what you want anyway because of the fundemental nature of the inverted index stats, leading you to look for even smaller, higher, hoops to try and jump through. it would be far simpler to just ask for the exact set of N documents you wnat from Solr in default order, re-order the resulting documents in the magic order you already know and care about, and then give that modified response to your front end. -Hoss http://www.lucidworks.com/ -- Thanks Regards, Apoorva