Re: Tweaking boosts for more search results variety

2013-09-10 Thread Marc Sturlese
This is totally deprecated but maybe can be helpful if you want to re-sort
some documents
https://issues.apache.org/jira/browse/SOLR-1311



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tweaking boosts for more search results variety

2013-09-10 Thread Sai Gadde
Perfect. This is exactly what we need!

I wish there is an option for plugin (or) if there is some feature like
this in mainstream Solr release.

Still this is a great resource for us. Thanks Marc for pointing to very
useful information.

Thanks all for the help.




On Tue, Sep 10, 2013 at 5:30 PM, Marc Sturlese marc.sturl...@gmail.comwrote:

 This is totally deprecated but maybe can be helpful if you want to re-sort
 some documents
 https://issues.apache.org/jira/browse/SOLR-1311



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Tweaking boosts for more search results variety

2013-09-08 Thread Sai Gadde
Sorry for the delayed response.

Limitations in this scenario where we have 5 million indexed documents from
about only 1000 sites. If results are grouped by site we will not be able
to show more than a couple of pages for lot of search keywords.


Ex: Search for Solr has 1000 matches but only from 20 sites.
In these 20 sites
10 sites are of sitetype A - boost 5
7 sites are of sitetype B - boost 2
3 sites are of sitetype C - boost 1

Limitation 1: If these are grouped by site only 20 results would be
displayed in 2 pages (10 per page).

We still want to display all the results. For a better user experience
Ideally we would like to have 10 results in page 1  from 10 distinct
sites of sitetype A (which has higher boost already) or In a real world
scenario from 7-8 distinct sites. In our case we see like 7 matches on a
page from a single site.

Limitation 2: Inverse Document frequency (IDF) would have helped here but,
in that case our preferential boost for sitetypes is ignored and some
results from sitetype C would come on top due to IDF boost.

What we want to achieve is any way to control variety of sites displayed in
search results with preferential boost still in place.

Thanks in advance




On Sun, Sep 8, 2013 at 6:36 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 What do you mean with *these limitations *Do you want to make multiple
 grouping at same time?


 2013/9/6 Sai Gadde gadde@gmail.com

  Thank you Jack for the suggestion.
 
  We can try group by site. But considering that number of sites are only
  about 1000 against the index size of 5 million, One can expect most of
 the
  hits would be hidden and for certain specific keywords only a handful of
  actual results could be displayed if results are grouped by site.
 
  we already group on a signature field to identify duplicate content in
  these 5 million+ docs. But here the number of duplicates are only about
  3-5% maximum.
 
  Is there any workaround for these limitations with grouping?
 
  Thanks
  Shyam
 
 
 
  On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.com
  wrote:
 
   The grouping (field collapsing) feature somewhat addresses this - group
  by
   a site field and then if more than one or a few top pages are from
 the
   same site they get grouped or collapsed so that you can see more sites
  in a
   few results.
  
   See:
   http://wiki.apache.org/solr/**FieldCollapsing
  http://wiki.apache.org/solr/FieldCollapsing
   https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping
  https://cwiki.apache.org/confluence/display/solr/Result+Grouping
  
   -- Jack Krupansky
  
   -Original Message- From: Sai Gadde
   Sent: Thursday, September 05, 2013 2:27 AM
   To: solr-user@lucene.apache.org
   Subject: Tweaking boosts for more search results variety
  
  
   Our index is aggregated content from various sites on the web. We want
  good
   user experience by showing multiple sites in the search results. In our
   setup we are seeing most of the results from same site on the top.
  
   Here is some information regarding queries and schema
  site - String field. We have about 1000 sites in index
  sitetype - String field.  we have 3 site types
   omitNorms=true for both the fields
  
   Doc count varies largely based on site and sitetype by a factor of 10 -
   1000 times
   Total index size is about 5 million docs.
   Solr Version: 4.0
  
   In our queries we have a fixed and preferential boost for certain
 sites.
   sitetype has different and fixed boosts for 3 possible values. We
 turned
   off Inverse Document Frequency (IDF) for these boosts to work properly.
   Other text fields are boosted based on search keywords only.
  
   With this setup we often see a bunch of hits from a single site
 followed
  by
   next etc.,
   Is there any solution to see results from variety of sites and still
 keep
   the preferential boosts in place?
  
 



Re: Tweaking boosts for more search results variety

2013-09-07 Thread Furkan KAMACI
What do you mean with *these limitations *Do you want to make multiple
grouping at same time?


2013/9/6 Sai Gadde gadde@gmail.com

 Thank you Jack for the suggestion.

 We can try group by site. But considering that number of sites are only
 about 1000 against the index size of 5 million, One can expect most of the
 hits would be hidden and for certain specific keywords only a handful of
 actual results could be displayed if results are grouped by site.

 we already group on a signature field to identify duplicate content in
 these 5 million+ docs. But here the number of duplicates are only about
 3-5% maximum.

 Is there any workaround for these limitations with grouping?

 Thanks
 Shyam



 On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  The grouping (field collapsing) feature somewhat addresses this - group
 by
  a site field and then if more than one or a few top pages are from the
  same site they get grouped or collapsed so that you can see more sites
 in a
  few results.
 
  See:
  http://wiki.apache.org/solr/**FieldCollapsing
 http://wiki.apache.org/solr/FieldCollapsing
  https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping
 https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
  -- Jack Krupansky
 
  -Original Message- From: Sai Gadde
  Sent: Thursday, September 05, 2013 2:27 AM
  To: solr-user@lucene.apache.org
  Subject: Tweaking boosts for more search results variety
 
 
  Our index is aggregated content from various sites on the web. We want
 good
  user experience by showing multiple sites in the search results. In our
  setup we are seeing most of the results from same site on the top.
 
  Here is some information regarding queries and schema
 site - String field. We have about 1000 sites in index
 sitetype - String field.  we have 3 site types
  omitNorms=true for both the fields
 
  Doc count varies largely based on site and sitetype by a factor of 10 -
  1000 times
  Total index size is about 5 million docs.
  Solr Version: 4.0
 
  In our queries we have a fixed and preferential boost for certain sites.
  sitetype has different and fixed boosts for 3 possible values. We turned
  off Inverse Document Frequency (IDF) for these boosts to work properly.
  Other text fields are boosted based on search keywords only.
 
  With this setup we often see a bunch of hits from a single site followed
 by
  next etc.,
  Is there any solution to see results from variety of sites and still keep
  the preferential boosts in place?
 



Re: Tweaking boosts for more search results variety

2013-09-06 Thread Sai Gadde
Thank you Jack for the suggestion.

We can try group by site. But considering that number of sites are only
about 1000 against the index size of 5 million, One can expect most of the
hits would be hidden and for certain specific keywords only a handful of
actual results could be displayed if results are grouped by site.

we already group on a signature field to identify duplicate content in
these 5 million+ docs. But here the number of duplicates are only about
3-5% maximum.

Is there any workaround for these limitations with grouping?

Thanks
Shyam



On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.comwrote:

 The grouping (field collapsing) feature somewhat addresses this - group by
 a site field and then if more than one or a few top pages are from the
 same site they get grouped or collapsed so that you can see more sites in a
 few results.

 See:
 http://wiki.apache.org/solr/**FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsing
 https://cwiki.apache.org/**confluence/display/solr/**Result+Groupinghttps://cwiki.apache.org/confluence/display/solr/Result+Grouping

 -- Jack Krupansky

 -Original Message- From: Sai Gadde
 Sent: Thursday, September 05, 2013 2:27 AM
 To: solr-user@lucene.apache.org
 Subject: Tweaking boosts for more search results variety


 Our index is aggregated content from various sites on the web. We want good
 user experience by showing multiple sites in the search results. In our
 setup we are seeing most of the results from same site on the top.

 Here is some information regarding queries and schema
site - String field. We have about 1000 sites in index
sitetype - String field.  we have 3 site types
 omitNorms=true for both the fields

 Doc count varies largely based on site and sitetype by a factor of 10 -
 1000 times
 Total index size is about 5 million docs.
 Solr Version: 4.0

 In our queries we have a fixed and preferential boost for certain sites.
 sitetype has different and fixed boosts for 3 possible values. We turned
 off Inverse Document Frequency (IDF) for these boosts to work properly.
 Other text fields are boosted based on search keywords only.

 With this setup we often see a bunch of hits from a single site followed by
 next etc.,
 Is there any solution to see results from variety of sites and still keep
 the preferential boosts in place?



Re: Tweaking boosts for more search results variety

2013-09-05 Thread Jack Krupansky
The grouping (field collapsing) feature somewhat addresses this - group by a 
site field and then if more than one or a few top pages are from the same 
site they get grouped or collapsed so that you can see more sites in a few 
results.


See:
http://wiki.apache.org/solr/FieldCollapsing
https://cwiki.apache.org/confluence/display/solr/Result+Grouping

-- Jack Krupansky

-Original Message- 
From: Sai Gadde

Sent: Thursday, September 05, 2013 2:27 AM
To: solr-user@lucene.apache.org
Subject: Tweaking boosts for more search results variety

Our index is aggregated content from various sites on the web. We want good
user experience by showing multiple sites in the search results. In our
setup we are seeing most of the results from same site on the top.

Here is some information regarding queries and schema
   site - String field. We have about 1000 sites in index
   sitetype - String field.  we have 3 site types
omitNorms=true for both the fields

Doc count varies largely based on site and sitetype by a factor of 10 -
1000 times
Total index size is about 5 million docs.
Solr Version: 4.0

In our queries we have a fixed and preferential boost for certain sites.
sitetype has different and fixed boosts for 3 possible values. We turned
off Inverse Document Frequency (IDF) for these boosts to work properly.
Other text fields are boosted based on search keywords only.

With this setup we often see a bunch of hits from a single site followed by
next etc.,
Is there any solution to see results from variety of sites and still keep
the preferential boosts in place?