Re: Tweaking boosts for more search results variety
This is totally deprecated but maybe can be helpful if you want to re-sort some documents https://issues.apache.org/jira/browse/SOLR-1311 -- View this message in context: http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tweaking boosts for more search results variety
Perfect. This is exactly what we need! I wish there is an option for plugin (or) if there is some feature like this in mainstream Solr release. Still this is a great resource for us. Thanks Marc for pointing to very useful information. Thanks all for the help. On Tue, Sep 10, 2013 at 5:30 PM, Marc Sturlese marc.sturl...@gmail.comwrote: This is totally deprecated but maybe can be helpful if you want to re-sort some documents https://issues.apache.org/jira/browse/SOLR-1311 -- View this message in context: http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tweaking boosts for more search results variety
Sorry for the delayed response. Limitations in this scenario where we have 5 million indexed documents from about only 1000 sites. If results are grouped by site we will not be able to show more than a couple of pages for lot of search keywords. Ex: Search for Solr has 1000 matches but only from 20 sites. In these 20 sites 10 sites are of sitetype A - boost 5 7 sites are of sitetype B - boost 2 3 sites are of sitetype C - boost 1 Limitation 1: If these are grouped by site only 20 results would be displayed in 2 pages (10 per page). We still want to display all the results. For a better user experience Ideally we would like to have 10 results in page 1 from 10 distinct sites of sitetype A (which has higher boost already) or In a real world scenario from 7-8 distinct sites. In our case we see like 7 matches on a page from a single site. Limitation 2: Inverse Document frequency (IDF) would have helped here but, in that case our preferential boost for sitetypes is ignored and some results from sitetype C would come on top due to IDF boost. What we want to achieve is any way to control variety of sites displayed in search results with preferential boost still in place. Thanks in advance On Sun, Sep 8, 2013 at 6:36 AM, Furkan KAMACI furkankam...@gmail.comwrote: What do you mean with *these limitations *Do you want to make multiple grouping at same time? 2013/9/6 Sai Gadde gadde@gmail.com Thank you Jack for the suggestion. We can try group by site. But considering that number of sites are only about 1000 against the index size of 5 million, One can expect most of the hits would be hidden and for certain specific keywords only a handful of actual results could be displayed if results are grouped by site. we already group on a signature field to identify duplicate content in these 5 million+ docs. But here the number of duplicates are only about 3-5% maximum. Is there any workaround for these limitations with grouping? Thanks Shyam On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.com wrote: The grouping (field collapsing) feature somewhat addresses this - group by a site field and then if more than one or a few top pages are from the same site they get grouped or collapsed so that you can see more sites in a few results. See: http://wiki.apache.org/solr/**FieldCollapsing http://wiki.apache.org/solr/FieldCollapsing https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping https://cwiki.apache.org/confluence/display/solr/Result+Grouping -- Jack Krupansky -Original Message- From: Sai Gadde Sent: Thursday, September 05, 2013 2:27 AM To: solr-user@lucene.apache.org Subject: Tweaking boosts for more search results variety Our index is aggregated content from various sites on the web. We want good user experience by showing multiple sites in the search results. In our setup we are seeing most of the results from same site on the top. Here is some information regarding queries and schema site - String field. We have about 1000 sites in index sitetype - String field. we have 3 site types omitNorms=true for both the fields Doc count varies largely based on site and sitetype by a factor of 10 - 1000 times Total index size is about 5 million docs. Solr Version: 4.0 In our queries we have a fixed and preferential boost for certain sites. sitetype has different and fixed boosts for 3 possible values. We turned off Inverse Document Frequency (IDF) for these boosts to work properly. Other text fields are boosted based on search keywords only. With this setup we often see a bunch of hits from a single site followed by next etc., Is there any solution to see results from variety of sites and still keep the preferential boosts in place?
Re: Tweaking boosts for more search results variety
What do you mean with *these limitations *Do you want to make multiple grouping at same time? 2013/9/6 Sai Gadde gadde@gmail.com Thank you Jack for the suggestion. We can try group by site. But considering that number of sites are only about 1000 against the index size of 5 million, One can expect most of the hits would be hidden and for certain specific keywords only a handful of actual results could be displayed if results are grouped by site. we already group on a signature field to identify duplicate content in these 5 million+ docs. But here the number of duplicates are only about 3-5% maximum. Is there any workaround for these limitations with grouping? Thanks Shyam On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.com wrote: The grouping (field collapsing) feature somewhat addresses this - group by a site field and then if more than one or a few top pages are from the same site they get grouped or collapsed so that you can see more sites in a few results. See: http://wiki.apache.org/solr/**FieldCollapsing http://wiki.apache.org/solr/FieldCollapsing https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping https://cwiki.apache.org/confluence/display/solr/Result+Grouping -- Jack Krupansky -Original Message- From: Sai Gadde Sent: Thursday, September 05, 2013 2:27 AM To: solr-user@lucene.apache.org Subject: Tweaking boosts for more search results variety Our index is aggregated content from various sites on the web. We want good user experience by showing multiple sites in the search results. In our setup we are seeing most of the results from same site on the top. Here is some information regarding queries and schema site - String field. We have about 1000 sites in index sitetype - String field. we have 3 site types omitNorms=true for both the fields Doc count varies largely based on site and sitetype by a factor of 10 - 1000 times Total index size is about 5 million docs. Solr Version: 4.0 In our queries we have a fixed and preferential boost for certain sites. sitetype has different and fixed boosts for 3 possible values. We turned off Inverse Document Frequency (IDF) for these boosts to work properly. Other text fields are boosted based on search keywords only. With this setup we often see a bunch of hits from a single site followed by next etc., Is there any solution to see results from variety of sites and still keep the preferential boosts in place?
Re: Tweaking boosts for more search results variety
Thank you Jack for the suggestion. We can try group by site. But considering that number of sites are only about 1000 against the index size of 5 million, One can expect most of the hits would be hidden and for certain specific keywords only a handful of actual results could be displayed if results are grouped by site. we already group on a signature field to identify duplicate content in these 5 million+ docs. But here the number of duplicates are only about 3-5% maximum. Is there any workaround for these limitations with grouping? Thanks Shyam On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.comwrote: The grouping (field collapsing) feature somewhat addresses this - group by a site field and then if more than one or a few top pages are from the same site they get grouped or collapsed so that you can see more sites in a few results. See: http://wiki.apache.org/solr/**FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsing https://cwiki.apache.org/**confluence/display/solr/**Result+Groupinghttps://cwiki.apache.org/confluence/display/solr/Result+Grouping -- Jack Krupansky -Original Message- From: Sai Gadde Sent: Thursday, September 05, 2013 2:27 AM To: solr-user@lucene.apache.org Subject: Tweaking boosts for more search results variety Our index is aggregated content from various sites on the web. We want good user experience by showing multiple sites in the search results. In our setup we are seeing most of the results from same site on the top. Here is some information regarding queries and schema site - String field. We have about 1000 sites in index sitetype - String field. we have 3 site types omitNorms=true for both the fields Doc count varies largely based on site and sitetype by a factor of 10 - 1000 times Total index size is about 5 million docs. Solr Version: 4.0 In our queries we have a fixed and preferential boost for certain sites. sitetype has different and fixed boosts for 3 possible values. We turned off Inverse Document Frequency (IDF) for these boosts to work properly. Other text fields are boosted based on search keywords only. With this setup we often see a bunch of hits from a single site followed by next etc., Is there any solution to see results from variety of sites and still keep the preferential boosts in place?
Re: Tweaking boosts for more search results variety
The grouping (field collapsing) feature somewhat addresses this - group by a site field and then if more than one or a few top pages are from the same site they get grouped or collapsed so that you can see more sites in a few results. See: http://wiki.apache.org/solr/FieldCollapsing https://cwiki.apache.org/confluence/display/solr/Result+Grouping -- Jack Krupansky -Original Message- From: Sai Gadde Sent: Thursday, September 05, 2013 2:27 AM To: solr-user@lucene.apache.org Subject: Tweaking boosts for more search results variety Our index is aggregated content from various sites on the web. We want good user experience by showing multiple sites in the search results. In our setup we are seeing most of the results from same site on the top. Here is some information regarding queries and schema site - String field. We have about 1000 sites in index sitetype - String field. we have 3 site types omitNorms=true for both the fields Doc count varies largely based on site and sitetype by a factor of 10 - 1000 times Total index size is about 5 million docs. Solr Version: 4.0 In our queries we have a fixed and preferential boost for certain sites. sitetype has different and fixed boosts for 3 possible values. We turned off Inverse Document Frequency (IDF) for these boosts to work properly. Other text fields are boosted based on search keywords only. With this setup we often see a bunch of hits from a single site followed by next etc., Is there any solution to see results from variety of sites and still keep the preferential boosts in place?