Re: Remove duplicate suggestions in Solr

2015-08-23 Thread Zheng Lin Edwin Yeo
Hi Arcadius,

Thank you for your reply.

So this means that the de-duplication has to be done during indexing time,
and not during query time?

Yes, currently I'm building on the "search" to be do my suggestion as I
faced some issues with the suggestions components in the Solr 5.1.0 version.
Will the suggestion components solve this issues of giving duplicating
suggestions?

There might also be cases where about 1/2 to 3/4 of my indexed documents
are the same, with only the remaining 1/4 to 1/2 are different. So this
will probably lead to cases where the index is different, but a search may
return the part of the document that are the same.


Regards,
Edwin


On 23 August 2015 at 21:44, Arcadius Ahouansou  wrote:

> Hi Edwin.
>
> What you are doing here is "search" as Solr has separate components for
> doing suggestions.
>
> About dedup,
>
> - have a look at  the manual
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
> - or simply do your dedup upfront before ingesting into Solr by assigning
> the same "id" to all doc with same "textng" (may require a different index
> if you want to keep the existing data with duplicate for other purpose)
>
> - Or you could use result grouping/fieldCollapsing to group/dedup your
> result
>
> Hope this helps
>
> Arcadius.
>
>
> On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi,
> >
> > I would like to check, is there anyway to remove duplicate suggestions in
> > Solr?
> > I have several documents that looks very similar, and when I do a
> > suggestion query, it came back with all same results. I'm using Solr
> 5.2.1
> >
> > This is my suggestion pipeline:
> >
> > 
> > 
> > 
> > all
> >   json
> >   true
> >
> > 
> > edismax
> > 10
> > id, score
> > 
> > 
> > content^50 title^50 extrasearch^30.0
> > textnge^50.0
> > 
> > 
> >  >
> >
> name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)
> > 1.0
> >
> > content_type:"application/pdf"
> > 0.9
> > content_type:"application/msword"
> > 0.5
> > content_type:"NA"
> > 0.0
> > content_type:"NA"
> > 0.0
> >   on
> >   id, textng, textng2, language_s
> >   true
> >   true
> >   html
> >   
> >   50
> > false
> > 
> > 
> >
> > This is my query:
> > http://localhost:8983/edm/chinese2/suggest?q=do our
> > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
> >
> textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true
> >
> >
> > This is the suggestion result:
> >
> >  "highlighting":{
> > "responsibility001":{
> >   "id":["responsibility001"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility002":{
> >   "id":["responsibility002"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility003":{
> >   "id":["responsibility003"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility004":{
> >   "id":["responsibility004"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility005":{
> >   "id":["responsibility005"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility006":{
> >   "id":["responsibility006"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility007":{
> >   "id":["responsibility007"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility008":{
> >   "id":["responsibility008"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility009":{
> >   "id":["responsibility009"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> > "responsibility010":{
> >   "id":["responsibility010"],
> >   "textng":["We will strive to do our
> best.
> >  <br> "],
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---
>


Re: Remove duplicate suggestions in Solr

2015-08-23 Thread Arcadius Ahouansou
Hi Edwin.

What you are doing here is "search" as Solr has separate components for
doing suggestions.

About dedup,

- have a look at  the manual
https://cwiki.apache.org/confluence/display/solr/De-Duplication

- or simply do your dedup upfront before ingesting into Solr by assigning
the same "id" to all doc with same "textng" (may require a different index
if you want to keep the existing data with duplicate for other purpose)

- Or you could use result grouping/fieldCollapsing to group/dedup your
result

Hope this helps

Arcadius.


On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> I would like to check, is there anyway to remove duplicate suggestions in
> Solr?
> I have several documents that looks very similar, and when I do a
> suggestion query, it came back with all same results. I'm using Solr 5.2.1
>
> This is my suggestion pipeline:
>
> 
> 
> 
> all
>   json
>   true
>
> 
> edismax
> 10
> id, score
> 
> 
> content^50 title^50 extrasearch^30.0
> textnge^50.0
> 
> 
> 
> name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)
> 1.0
>
> content_type:"application/pdf"
> 0.9
> content_type:"application/msword"
> 0.5
> content_type:"NA"
> 0.0
> content_type:"NA"
> 0.0
>   on
>   id, textng, textng2, language_s
>   true
>   true
>   html
>   
>   50
> false
> 
> 
>
> This is my query:
> http://localhost:8983/edm/chinese2/suggest?q=do our
> best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
> textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true
>
>
> This is the suggestion result:
>
>  "highlighting":{
> "responsibility001":{
>   "id":["responsibility001"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility002":{
>   "id":["responsibility002"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility003":{
>   "id":["responsibility003"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility004":{
>   "id":["responsibility004"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility005":{
>   "id":["responsibility005"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility006":{
>   "id":["responsibility006"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility007":{
>   "id":["responsibility007"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility008":{
>   "id":["responsibility008"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility009":{
>   "id":["responsibility009"],
>   "textng":["We will strive to do our best.
>  <br> "],
> "responsibility010":{
>   "id":["responsibility010"],
>   "textng":["We will strive to do our best.
>  <br> "],
>
>
> Regards,
> Edwin
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Remove duplicate suggestions in Solr

2015-08-20 Thread Zheng Lin Edwin Yeo
Hi,

I would like to check, is there anyway to remove duplicate suggestions in
Solr?
I have several documents that looks very similar, and when I do a
suggestion query, it came back with all same results. I'm using Solr 5.2.1

This is my suggestion pipeline:




all
  json
  true


edismax
10
id, score


content^50 title^50 extrasearch^30.0
textnge^50.0


product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)
1.0

content_type:"application/pdf"
0.9
content_type:"application/msword"
0.5
content_type:"NA"
0.0
content_type:"NA"
0.0
  on
  id, textng, textng2, language_s
  true
  true
  html
  
  50
false



This is my query:
http://localhost:8983/edm/chinese2/suggest?q=do our
best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true


This is the suggestion result:

 "highlighting":{
"responsibility001":{
  "id":["responsibility001"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility002":{
  "id":["responsibility002"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility003":{
  "id":["responsibility003"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility004":{
  "id":["responsibility004"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility005":{
  "id":["responsibility005"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility006":{
  "id":["responsibility006"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility007":{
  "id":["responsibility007"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility008":{
  "id":["responsibility008"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility009":{
  "id":["responsibility009"],
  "textng":["We will strive to do our best.
 <br> "],
"responsibility010":{
  "id":["responsibility010"],
  "textng":["We will strive to do our best.
 <br> "],


Regards,
Edwin