Re: best load balancer for solr cloud

2014-10-14 Thread Apoorva Gaurav
Thanks Shawn, Amey,

Any specific configuration needed for CloudSolrServer as I've seen
increased latency on using it. Does ConcurrentUpdateSolrServer itself do
discovery like CloudSolrServer.

On Mon, Oct 13, 2014 at 7:53 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 10/13/2014 5:28 AM, Apoorva Gaurav wrote:
  Is it preferable to use CloudSolrServer or using an external load
 balancer
  like haproxy. We're currently channeling all our requests via haproxy but
  want to get rid of management hassles as well as additional network call
  but saw a significant degradation in latency on switching to
  CloudSolrServer. Please suggest.

 If your client is Java, then there's no contest.  Use CloudSolrServer.
 It can react almost instantly to cluster changes.  A load balancer will
 need to do a health-check cycle before it knows about machines coming up
 or going down.

 The other reply that you received mentioned ConcurrentUpdateSolrServer.
  This is a high-performance option, but it comes at a cost -- your
 application will never be informed about any indexing errors.  Even if
 the index requests all fail, your application will never know.

 Thanks,
 Shawn




-- 
Thanks  Regards,
Apoorva


best load balancer for solr cloud

2014-10-13 Thread Apoorva Gaurav
Hello All,

Is it preferable to use CloudSolrServer or using an external load balancer
like haproxy. We're currently channeling all our requests via haproxy but
want to get rid of management hassles as well as additional network call
but saw a significant degradation in latency on switching to
CloudSolrServer. Please suggest.

-- 
Thanks  Regards,
Apoorva


Disable caching in sort

2014-09-21 Thread Apoorva Gaurav
Hello All,

We are trying to provide a personalized sort order for each user. We've a
per-computed list of user to products and if it matches the solr result set
those products need to be shown upfront. One way can be handling this in
application but pagination becomes tricky. Another way we are exploring
this is via a custom value source where we'll pass productid to
custom-score and sort based on this. We've been able to manipulate result
set using this, but sort order is getting cached. One way can be using
{!cache=false} but that would lead to performance degradation. Any other
way of achieving this?

-- 
Thanks  Regards,
Apoorva


Re: Help on custom sort

2014-09-21 Thread Apoorva Gaurav
Try using a custom value source parser and pass the formula of computing
the price to solr; something like this
http://java.dzone.com/articles/connecting-redis-solr-boosting

On Mon, Sep 22, 2014 at 1:38 AM, Scott Smith ssm...@mainstreamdata.com
wrote:

 There are likely several hundred groups.  Also, new groups will be added
 and some groups will be deleted.  So, I don't think putting a field in the
 docs works.  Having to add a new group price into 100 million+ documents
 doesn't seem reasonable.

 Right now I'm looking at
 http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external.html.
 This reference a much older version of solr (the blog is from 2011) and so
 I will need to update the classes referenced.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, September 20, 2014 11:58 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Help on custom sort

 How many different groups are there? And can user A ever be part of more
 than one group?
 If
 1 there are a reasonably small number of groups ( 100 or so as a
 place to start)
 and
 2 a user is always part of a single group

 then you could store separate prices in each document by group, thus you'd
 have some fields like
 price_group_a: $100
 price_group_b: $101

 then sorting  becomes trivial, you just specify a sort_group_a for users
 in group A etc. If the number of groups is unknown-but-not-huge dynamic
 fields could be used.

 If that's not the case, then you might be able to get clever with sorting
 by function, here's a place to start:
 https://cwiki.apache.org/confluence/display/solr/Function+Queries

 These can be arbitrarily complex, but I'm thinking something where the
 price returned by the function respects the group the user is in, perhaps
 even the min/max of all the groups the user is in. I admit I haven't really
 thought that through well though...

 Best,
 Erick

 On Sat, Sep 20, 2014 at 9:26 AM, Scott Smith ssm...@mainstreamdata.com
 wrote:
  I need to provide a custom sort option for sorting by price and I would
 like some suggestions.  It's not the straightforward just sort by a price
 field in the document scenario or I wouldn't be asking for help.  Here's
 the scenario I'm dealing with.
 
  I have 100 million+ documents (so multi-sharded).  Users search for
 documents they are interested in using a standard keyword search.  They
 then purchase documents they are interested in.  So far, nothing hard.
 
  Here's where things get interesting.  The documents come from multiple
 suppliers.  Each supplier sets a price for his documents and different
 suppliers will provide different pricing.
 
  That wouldn't be difficult except that *users* are divided up into
 different groups and depending on which group they are in, the supplier
 will charge the user a different price.  So, user A may pay one price for a
 document and user B may pay a different price for the same document just
 because user A and user B are in different groups.  I don't even know if
 the relative order or pricing is the same between different groups (e.g.,
 if document X is more expensive than document Y for a user in group M, it
 may not be more expensive for a user in group N).  The one thing that may
 make this doable is that supplier A will likely have the same price for all
 of his documents for each of the user groups.  So, a user in group A will
 pay the same price regardless of which document he buys from supplier 1.  A
 user in group B will also pay the same price for any document from supplier
 1; it's just that a user in group B will likely pay a different price than
 a user in group A.  So, within a supplier, the price varies based on user
 group, not the document.
 
  To summarize, one of the requirements for the system is that we provide
 the ability to sort search results based on price.  This would be easy
 except that the price a user pays not only depends on what he wants to buy,
 but on what group the he is in.
 
  I suspect there is some kind of custom solr module I'm going to have to
 write.  I'm thinking that the user group gets passed in as a custom solr
 parameter (I'm assuming that's possible??).  Then I'm thinking that there
 has to be some kind of in memory database that tracks pricing based on user
 group and document supplier).
 
  I'm happy to go read code, documents, links, etc if someone can point me
 in the right direction.  What kind of solr module am I likely going to
 write (extend) and are there some examples somewhere?  Maybe there's a way
 to do this without having to extend a solr module??
 
  Hope this makes sense.  Any help is appreciated.
 
  Scott
 
 




-- 
Thanks  Regards,
Apoorva


Re: wrong docFreq while executing query based on uniqueKey-field

2014-07-22 Thread Apoorva Gaurav
I faced the same issue sometime back, root cause is docs getting deleted
and created again without getting optimized. Here is the discussion
http://www.signaldump.org/solr/qpod/22731/docfreq-coming-to-be-more-than-1-for-unique-id-field


On Tue, Jul 22, 2014 at 4:56 PM, Johannes Siegert 
johannes.sieg...@marktjagd.de wrote:

 Hi.

 My solr-index (version=4.7.2.) has an id-field:

 field  name=id  type=string  indexed=true  stored=true/
 ...
 uniqueKeyid/uniqueKey

 The index will be updated once per hour.

 I use the following query to retrieve some documents:

 q=id:2^2 id:1^1

 I would expect that the document(2) should be always before the
 document(1). But after many index updates document(1) is before document(2).

 With debug=true I could see the problem. The document(1) has a docFreq=2,
 while the document(2) has a docFreq=1.

 How could the docFreq of the uniqueKey-field be hight than 1? Could anyone
 explain this behavior to me?

 Thanks!

 Johannes




-- 
Thanks  Regards,
Apoorva


Re: External File Field eating memory

2014-07-16 Thread Apoorva Gaurav
Thanks Kamal.


On Wed, Jul 16, 2014 at 11:43 AM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi Apporva,

 This was my master server replication configuration:

 core/conf/solrconfig.xml

 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFiles../data/external_eff_views/str
  /lst
  /requestHandler


 It is only configuration files that can be replicated. So, when I wrote the
 above config. The external files was getting replicated in
 core/conf/data/external_eff_views.
 But for solr to read the external file, it looks for it into
 core/data/external_eff_views
 location. So firstly the file was not getting replicated properly.
 Therefore, I did not opted the option of replicating the eff file.

 And the second thing is that whenever there is a change in configuration
 files, the core gets reloaded by itself to reflect the changes. I am not
 sure if you can disable this reloading.

 Finally, I thought of creating files on slaves in a different way.

 Thanks
 Kamal


 On Tue, Jul 15, 2014 at 11:00 AM, Apoorva Gaurav 
 apoorva.gau...@myntra.com
 wrote:

  Hey Kamal,
  What all config changes have you done to establish replication of
 external
  files and how have you disabled role reloading?
 
 
  On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
   Hi All,
  
   It was found that external file, which was getting replicated after
 every
   10 minutes was reloading the core as well. This was increasing the
 query
   time.
  
   Thanks
   Kamal Kishore
  
  
  
   On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
With the above replication configuration, the eff file is getting
replicated at core/conf/data/external_eff_views (new dir data is
 being
created in conf dir) location, but it is not getting replicated at
   core/data/external_eff_views
on slave.
   
Please help.
   
   
On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:
   
Thanks for your guidance Alexandre Rafalovitch.
   
I am looking into this seriously.
   
Another question is that I facing error in replication of eff file
   
This is master replication configuration:
   
core/conf/solrconfig.xml
   
requestHandler name=/replication class=solr.ReplicationHandler
 
lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=confFiles../data/external_eff_views/str
/lst
/requestHandler
   
   
The eff file is present at core/data/external_eff_views location.
   
   
On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:
   
This might be related:
   
https://issues.apache.org/jira/browse/SOLR-3514
   
   
On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:
   
 Hi Team,

 I have recently implemented EFF in solr. There are about 1.5
lacs(unsorted)
 values in the external file. After this implementation, the
 server
   has
 become slow. The solr query time has also increased.

 Can anybody confirm me if these issues are because of this
implementation.
 Is that memory does EFF eats up?

 Regards
 Kamal Kishore

   
   
   
--
Regards,
Shalin Shekhar Mangar.
   
   
   
   
  
 
 
 
  --
  Thanks  Regards,
  Apoorva
 




-- 
Thanks  Regards,
Apoorva


Re: Sort not working in solr

2014-07-15 Thread Apoorva Gaurav
In fact its better using TrieIntField instead of IntField.
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3ccab_8yd9yp259kk4ciybbprjcpwqp6vd7yvrtjr1eubew_ky...@mail.gmail.com%3E
http://stackoverflow.com/questions/13372323/what-is-the-correct-solr-fieldtype-to-use-for-sorting-integer-values


On Tue, Jul 15, 2014 at 1:09 PM, スガヌマヨシカズ suganoo2...@gmail.com wrote:

 i think type=text_general make it charactered-sort for numbers.

 How about make it as type=int or type=long instead of text_general?

 
 field name=business_point type=text_general indexed=true
 stored=true required=false multiValued=false/
 

 Regards,
 suganuma


 2014-07-15 16:24 GMT+09:00 madhav bahuguna madhav.bahug...@gmail.com:

  Iam trying to sort my records but the result i get is not correct
  My url query--
 
 
 http://localhost:8983/solr/select/?q=*:*fl=business_pointsort=business_point+desc
 
  Iam trying to sort my records by business_points but the result i get is
 in
  like this
  9
  8
  7
  6
  5
  45
  4
  4
  10
  1
 
  Whys am i getting my results in the wrong order
  my schema looks like this
  field name=business_point type=text_general indexed=true
  stored=true required=false multiValued=false/
 
  --
  Regards
  Madhav Bahuguna
 




-- 
Thanks  Regards,
Apoorva


Re: External File Field eating memory

2014-07-14 Thread Apoorva Gaurav
Hey Kamal,
What all config changes have you done to establish replication of external
files and how have you disabled role reloading?


On Wed, Jul 9, 2014 at 11:30 AM, Kamal Kishore Aggarwal 
kkroyal@gmail.com wrote:

 Hi All,

 It was found that external file, which was getting replicated after every
 10 minutes was reloading the core as well. This was increasing the query
 time.

 Thanks
 Kamal Kishore



 On Thu, Jul 3, 2014 at 12:48 PM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  With the above replication configuration, the eff file is getting
  replicated at core/conf/data/external_eff_views (new dir data is being
  created in conf dir) location, but it is not getting replicated at
 core/data/external_eff_views
  on slave.
 
  Please help.
 
 
  On Thu, Jul 3, 2014 at 12:21 PM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
  Thanks for your guidance Alexandre Rafalovitch.
 
  I am looking into this seriously.
 
  Another question is that I facing error in replication of eff file
 
  This is master replication configuration:
 
  core/conf/solrconfig.xml
 
  requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=confFiles../data/external_eff_views/str
  /lst
  /requestHandler
 
 
  The eff file is present at core/data/external_eff_views location.
 
 
  On Thu, Jul 3, 2014 at 11:50 AM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
  This might be related:
 
  https://issues.apache.org/jira/browse/SOLR-3514
 
 
  On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal 
  kkroyal@gmail.com wrote:
 
   Hi Team,
  
   I have recently implemented EFF in solr. There are about 1.5
  lacs(unsorted)
   values in the external file. After this implementation, the server
 has
   become slow. The solr query time has also increased.
  
   Can anybody confirm me if these issues are because of this
  implementation.
   Is that memory does EFF eats up?
  
   Regards
   Kamal Kishore
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 




-- 
Thanks  Regards,
Apoorva


Re: docFreq coming to be more than 1 for unique id field

2014-06-23 Thread Apoorva Gaurav
Hello Markus, Ahmet,
Forgot to update the thread; optimization works i.e. after optimizing all
unique keys have docFreq as 1.


On Wed, Jun 18, 2014 at 1:58 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : text in it, query is of the type keywords:(word1 OR word2 ... OR
 wordN).
 : The client is relying on default relevancy based sort returned by solr.
 : Some documents can get penalised because of some other documents which
 were
 : deleted. Is this functionality correct?

 yes, because term stats are over the entire index including deleted
 documents still in segments -- information about deletions isn't purged
 from the index until a segment is merged and the stats are recomputed over
 the docs/terms in the new segment.

 the only way to get those types of statistics at request time such that
 they were *not* afected by deleted documents would involve scanning every
 doc to compute them -- which would defeat the point of having the inverted
 index.


 -Hoss
 http://www.lucidworks.com/




-- 
Thanks  Regards,
Apoorva


docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Apoorva Gaurav
Hello All,

We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need
to extract docs in a pre-defined order if they match a certain condition.
Our query is of the format

uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN)
where weight1  weight2    weightN

But the result is not in the desired order. On debugging the query we've
found out that for some of the documents docFreq is higher than 1 and hence
their tf-idf based score is less than others. What can be the reason behind
a unique id field having docFreq greater than 1?  How can we prevent it?

-- 
Thanks  Regards,
Apoorva


Re: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Apoorva Gaurav
Yes we have updates on these. Didn't try optimizing will do. But isn't the
unique field supposed to be unique?


On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi,

 Just a guess, do you have deletions? What happens when you optimize and
 re-try?



 On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav 
 apoorva.gau...@myntra.com wrote:
 Hello All,

 We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We need
 to extract docs in a pre-defined order if they match a certain condition.
 Our query is of the format

 uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN)
 where weight1  weight2    weightN

 But the result is not in the desired order. On debugging the query we've
 found out that for some of the documents docFreq is higher than 1 and hence
 their tf-idf based score is less than others. What can be the reason behind
 a unique id field having docFreq greater than 1?  How can we prevent it?

 --
 Thanks  Regards,
 Apoorva




-- 
Thanks  Regards,
Apoorva


Re: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Apoorva Gaurav
Will try optimizing and then respond to the thread.


On Tue, Jun 17, 2014 at 8:47 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 Yes, it is unique but they are not immediately purged, only when
 `optimized` or forceMerge or during regular segment merges. The problem is
 that they keep messing with the statistics.

 -Original message-
  From:Apoorva Gaurav apoorva.gau...@myntra.com
  Sent: Tuesday 17th June 2014 17:16
  To: solr-user solr-user@lucene.apache.org; Ahmet Arslan 
 iori...@yahoo.com
  Subject: Re: docFreq coming to be more than 1 for unique id field
 
  Yes we have updates on these. Didn't try optimizing will do. But isn't
 the
  unique field supposed to be unique?
 
 
  On Tue, Jun 17, 2014 at 8:37 PM, Ahmet Arslan iori...@yahoo.com.invalid
 
  wrote:
 
   Hi,
  
   Just a guess, do you have deletions? What happens when you optimize and
   re-try?
  
  
  
   On Tuesday, June 17, 2014 5:58 PM, Apoorva Gaurav 
   apoorva.gau...@myntra.com wrote:
   Hello All,
  
   We are using solr 4.4.0. We have a uniqueKey of type solr.StrField. We
 need
   to extract docs in a pre-defined order if they match a certain
 condition.
   Our query is of the format
  
   uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN)
   where weight1  weight2    weightN
  
   But the result is not in the desired order. On debugging the query
 we've
   found out that for some of the documents docFreq is higher than 1 and
 hence
   their tf-idf based score is less than others. What can be the reason
 behind
   a unique id field having docFreq greater than 1?  How can we prevent
 it?
  
   --
   Thanks  Regards,
   Apoorva
  
  
 
 
  --
  Thanks  Regards,
  Apoorva
 




-- 
Thanks  Regards,
Apoorva


Re: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Apoorva Gaurav
Currently we are not using SolrJ but are simply interacting with solr with
json over http, this will change in a couple of months but currently not
there. As of now we are putting all the logic in query building, using it
to query solr and then passing on the json returned by it to front end. I
know this is not the ideal approach, but that's what we have at the moment.
Hence need a way of deterministically order the result set provided they
match other search criteria.


On Tue, Jun 17, 2014 at 10:28 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 All index wide statistics (like the docFreq of each term) are over the
 entire index, which includes deleted docs -- because it's an *inverted*
 index, it's not feasible to update those statistics to account for deleted
 docs (that would basically kill all the performance advantages thatcome
 from having an inverted index.


 : uniqueField:(id1 ^ weight1 OR id2 ^ weight2 . OR idN ^ weightN)
 : where weight1  weight2    weightN
 :
 : But the result is not in the desired order. On debugging the query we've

 if you are requesting a small number of docs, and all the docs you are
 requesting are returned in a single request, why do you care what order
 they are in?  why not just put them in hte order you want on the client.

 That would not only make your solr request simpler, but would almost
 certainly be a bit *faster* since you could sort exactly as you wnated w/o
 needing to compute a complex score that you don't actaully care about.



 -Hoss
 http://www.lucidworks.com/




-- 
Thanks  Regards,
Apoorva


Re: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Apoorva Gaurav
OK lets for a moment forget about this specific use case and consider a
more general case. Lets say the field name is keywords are we are storing
text in it, query is of the type keywords:(word1 OR word2 ... OR wordN).
The client is relying on default relevancy based sort returned by solr.
Some documents can get penalised because of some other documents which were
deleted. Is this functionality correct?


On Wed, Jun 18, 2014 at 12:52 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Currently we are not using SolrJ but are simply interacting with solr
 with
 : json over http, this will change in a couple of months but currently not
 : there. As of now we are putting all the logic in query building, using it
 : to query solr and then passing on the json returned by it to front end. I
 : know this is not the ideal approach, but that's what we have at the
 moment.
 : Hence need a way of deterministically order the result set provided they
 : match other search criteria.

 wether you are using SOlrJ or not doesn't really change my point at all --
 you are jumping though all sorts of hoops, and asking solr to jump through
 all sorts of hoops, for a score you don't actaully care about, and isn't
 going ot work perfectly for what you want anyway because of the
 fundemental nature of the inverted index stats, leading you to look for
 even smaller, higher, hoops to try and jump through.

 it would be far simpler to just ask for the exact set of N documents you
 wnat from Solr in default order, re-order the resulting documents in the
 magic order you already know and care about, and then give that modified
 response to your front end.


 -Hoss
 http://www.lucidworks.com/




-- 
Thanks  Regards,
Apoorva