Re: Remove duplicate suggestions in Solr
Hi Arcadius, Thank you for your reply. So this means that the de-duplication has to be done during indexing time, and not during query time? Yes, currently I'm building on the "search" to be do my suggestion as I faced some issues with the suggestions components in the Solr 5.1.0 version. Will the suggestion components solve this issues of giving duplicating suggestions? There might also be cases where about 1/2 to 3/4 of my indexed documents are the same, with only the remaining 1/4 to 1/2 are different. So this will probably lead to cases where the index is different, but a search may return the part of the document that are the same. Regards, Edwin On 23 August 2015 at 21:44, Arcadius Ahouansou wrote: > Hi Edwin. > > What you are doing here is "search" as Solr has separate components for > doing suggestions. > > About dedup, > > - have a look at the manual > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > - or simply do your dedup upfront before ingesting into Solr by assigning > the same "id" to all doc with same "textng" (may require a different index > if you want to keep the existing data with duplicate for other purpose) > > - Or you could use result grouping/fieldCollapsing to group/dedup your > result > > Hope this helps > > Arcadius. > > > On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo > wrote: > > > Hi, > > > > I would like to check, is there anyway to remove duplicate suggestions in > > Solr? > > I have several documents that looks very similar, and when I do a > > suggestion query, it came back with all same results. I'm using Solr > 5.2.1 > > > > This is my suggestion pipeline: > > > > > > > > > > all > > json > > true > > > > > > edismax > > 10 > > id, score > > > > > > content^50 title^50 extrasearch^30.0 > > textnge^50.0 > > > > > > > > > > name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost) > > 1.0 > > > > content_type:"application/pdf" > > 0.9 > > content_type:"application/msword" > > 0.5 > > content_type:"NA" > > 0.0 > > content_type:"NA" > > 0.0 > > on > > id, textng, textng2, language_s > > true > > true > > html > > > > 50 > > false > > > > > > > > This is my query: > > http://localhost:8983/edm/chinese2/suggest?q=do our > > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20 > > > textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true > > > > > > This is the suggestion result: > > > > "highlighting":{ > > "responsibility001":{ > > "id":["responsibility001"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility002":{ > > "id":["responsibility002"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility003":{ > > "id":["responsibility003"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility004":{ > > "id":["responsibility004"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility005":{ > > "id":["responsibility005"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility006":{ > > "id":["responsibility006"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility007":{ > > "id":["responsibility007"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility008":{ > > "id":["responsibility008"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility009":{ > > "id":["responsibility009"], > > "textng":["We will strive to do our > best. > > <br> "], > > "responsibility010":{ > > "id":["responsibility010"], > > "textng":["We will strive to do our > best. > > <br> "], > > > > > > Regards, > > Edwin > > > > > > -- > Arcadius Ahouansou > Menelic Ltd | Information is Power > M: 07908761999 > W: www.menelic.com > --- >
Re: Remove duplicate suggestions in Solr
Hi Edwin. What you are doing here is "search" as Solr has separate components for doing suggestions. About dedup, - have a look at the manual https://cwiki.apache.org/confluence/display/solr/De-Duplication - or simply do your dedup upfront before ingesting into Solr by assigning the same "id" to all doc with same "textng" (may require a different index if you want to keep the existing data with duplicate for other purpose) - Or you could use result grouping/fieldCollapsing to group/dedup your result Hope this helps Arcadius. On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo wrote: > Hi, > > I would like to check, is there anyway to remove duplicate suggestions in > Solr? > I have several documents that looks very similar, and when I do a > suggestion query, it came back with all same results. I'm using Solr 5.2.1 > > This is my suggestion pipeline: > > > > > all > json > true > > > edismax > 10 > id, score > > > content^50 title^50 extrasearch^30.0 > textnge^50.0 > > > > name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost) > 1.0 > > content_type:"application/pdf" > 0.9 > content_type:"application/msword" > 0.5 > content_type:"NA" > 0.0 > content_type:"NA" > 0.0 > on > id, textng, textng2, language_s > true > true > html > > 50 > false > > > > This is my query: > http://localhost:8983/edm/chinese2/suggest?q=do our > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20 > textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true > > > This is the suggestion result: > > "highlighting":{ > "responsibility001":{ > "id":["responsibility001"], > "textng":["We will strive to do our best. > <br> "], > "responsibility002":{ > "id":["responsibility002"], > "textng":["We will strive to do our best. > <br> "], > "responsibility003":{ > "id":["responsibility003"], > "textng":["We will strive to do our best. > <br> "], > "responsibility004":{ > "id":["responsibility004"], > "textng":["We will strive to do our best. > <br> "], > "responsibility005":{ > "id":["responsibility005"], > "textng":["We will strive to do our best. > <br> "], > "responsibility006":{ > "id":["responsibility006"], > "textng":["We will strive to do our best. > <br> "], > "responsibility007":{ > "id":["responsibility007"], > "textng":["We will strive to do our best. > <br> "], > "responsibility008":{ > "id":["responsibility008"], > "textng":["We will strive to do our best. > <br> "], > "responsibility009":{ > "id":["responsibility009"], > "textng":["We will strive to do our best. > <br> "], > "responsibility010":{ > "id":["responsibility010"], > "textng":["We will strive to do our best. > <br> "], > > > Regards, > Edwin > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---
Remove duplicate suggestions in Solr
Hi, I would like to check, is there anyway to remove duplicate suggestions in Solr? I have several documents that looks very similar, and when I do a suggestion query, it came back with all same results. I'm using Solr 5.2.1 This is my suggestion pipeline: all json true edismax 10 id, score content^50 title^50 extrasearch^30.0 textnge^50.0 product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost) 1.0 content_type:"application/pdf" 0.9 content_type:"application/msword" 0.5 content_type:"NA" 0.0 content_type:"NA" 0.0 on id, textng, textng2, language_s true true html 50 false This is my query: http://localhost:8983/edm/chinese2/suggest?q=do our best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20 textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true This is the suggestion result: "highlighting":{ "responsibility001":{ "id":["responsibility001"], "textng":["We will strive to do our best. <br> "], "responsibility002":{ "id":["responsibility002"], "textng":["We will strive to do our best. <br> "], "responsibility003":{ "id":["responsibility003"], "textng":["We will strive to do our best. <br> "], "responsibility004":{ "id":["responsibility004"], "textng":["We will strive to do our best. <br> "], "responsibility005":{ "id":["responsibility005"], "textng":["We will strive to do our best. <br> "], "responsibility006":{ "id":["responsibility006"], "textng":["We will strive to do our best. <br> "], "responsibility007":{ "id":["responsibility007"], "textng":["We will strive to do our best. <br> "], "responsibility008":{ "id":["responsibility008"], "textng":["We will strive to do our best. <br> "], "responsibility009":{ "id":["responsibility009"], "textng":["We will strive to do our best. <br> "], "responsibility010":{ "id":["responsibility010"], "textng":["We will strive to do our best. <br> "], Regards, Edwin