Re: Dealing with multi-word keywords and SOW=true

2019-09-30 Thread Erick Erickson
You should not leave it in the qf field. You’re getting confused by the 
difference between query _parsing_ and the analysis chain. The parsing turns 
your top-level query of “ice cream” (assuming without quotes) into something 
like

f1:ice f1:cream f2:ice f2:cream

This is happening way before analysis takes over. what you need is for both 
“ice” and “cream” to be passed as a unit to the analysis chain, and if you rely 
on the qf parameter it won’t happen.

Best,
Erick

> On Sep 30, 2019, at 7:24 PM, Ashwin Ramesh  wrote:
> 
> Thanks Erick, that seems to work!
> 
> Should I leave it in qf also? For example the query "blue dog" may be
> represented as separate tokens in the keyword index.
> 
> 
> 
> On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson 
> wrote:
> 
>> Have you tried taking your keyword field out of the “qf” param and adding
>> it explicitly? As keyword:”ice cream”
>> 
>> Best,
>> Erick
>> 
>>> On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh  wrote:
>>> 
>>> Hi everybody,
>>> 
>>> I am using the edismax parser and have noticed a very specific behaviour
>>> with how sow=true (default) handles multiword keywords.
>>> 
>>> We have a field called 'keywords', which uses the general
>>> KeywordTokenizerFactory. There are also other text fields like title and
>>> description. etc.
>>> 
>>> When we index a document with a keyword "ice cream", for example, we know
>>> it gets indexed into that field as "ice cream".
>>> 
>>> However, at query time, I noticed that if we run an Edismax query:
>>> q=ice cream
>>> qf=keywords
>>> 
>>> I do not get that document back as a match. This is due to sow=true
>>> splitting the user's query and the final tokens not being present in the
>>> keywords field.
>>> 
>>> I was wondering what the best practise around this was? Some thoughts I
>>> have had:
>>> 
>>> 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice
>>> cream" -> "ice-cream"
>>> 2. Additionally index the separate words as keywords also. E.g. "ice
>> cream"
>>> -> "ice cream", "ice", "cream". However this method will result in the
>> loss
>>> of intent (q=ice would return this document).
>>> 3. Add a boost query which is an edismax query where we explicitly set
>>> sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000
>>> sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}*
>>> 
>>> Is there an industry practise solution to handle this type of problem?
>> Keep
>>> in mind that the other text fields may also include these terms. E.g.
>>> title="This is ice cream", which would match the query. This specific
>>> problem affects the keywords field for the obvious reason that the
>> indexing
>>> pipeline does not tokenize keywords.
>>> 
>>> Thank you for all your amazing help,
>>> 
>>> Regards,
>>> 
>>> Ash
>>> 
>>> --
>>> *P.S. We've launched a new blog to share the latest ideas and case
>> studies
>>> from our team. Check it out here: product.canva.com
>>> . ***
>>> ** Empowering the
>>> world to design
>>> Also, we're hiring. Apply here!
>>> 
>>> 
>>>  
>>>   
>>>   
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> . ***
> ** Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> 
>  
>   
>     
>   
> 
> 
> 
> 
> 
> 



Re: Dealing with multi-word keywords and SOW=true

2019-09-30 Thread Ashwin Ramesh
Thanks Erick, that seems to work!

Should I leave it in qf also? For example the query "blue dog" may be
represented as separate tokens in the keyword index.



On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson 
wrote:

> Have you tried taking your keyword field out of the “qf” param and adding
> it explicitly? As keyword:”ice cream”
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh  wrote:
> >
> > Hi everybody,
> >
> > I am using the edismax parser and have noticed a very specific behaviour
> > with how sow=true (default) handles multiword keywords.
> >
> > We have a field called 'keywords', which uses the general
> > KeywordTokenizerFactory. There are also other text fields like title and
> > description. etc.
> >
> > When we index a document with a keyword "ice cream", for example, we know
> > it gets indexed into that field as "ice cream".
> >
> > However, at query time, I noticed that if we run an Edismax query:
> > q=ice cream
> > qf=keywords
> >
> > I do not get that document back as a match. This is due to sow=true
> > splitting the user's query and the final tokens not being present in the
> > keywords field.
> >
> > I was wondering what the best practise around this was? Some thoughts I
> > have had:
> >
> > 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice
> > cream" -> "ice-cream"
> > 2. Additionally index the separate words as keywords also. E.g. "ice
> cream"
> > -> "ice cream", "ice", "cream". However this method will result in the
> loss
> > of intent (q=ice would return this document).
> > 3. Add a boost query which is an edismax query where we explicitly set
> > sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000
> > sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}*
> >
> > Is there an industry practise solution to handle this type of problem?
> Keep
> > in mind that the other text fields may also include these terms. E.g.
> > title="This is ice cream", which would match the query. This specific
> > problem affects the keywords field for the obvious reason that the
> indexing
> > pipeline does not tokenize keywords.
> >
> > Thank you for all your amazing help,
> >
> > Regards,
> >
> > Ash
> >
> > --
> > *P.S. We've launched a new blog to share the latest ideas and case
> studies
> > from our team. Check it out here: product.canva.com
> > . ***
> > ** Empowering the
> > world to design
> > Also, we're hiring. Apply here!
> > 
> > 
> >  
> >   
> >   
> >
> >
> >
> >
> >
> >
>
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
. ***
** Empowering the 
world to design
Also, we're hiring. Apply here! 

  
  
    
  








Re: Dealing with multi-word keywords and SOW=true

2019-09-30 Thread Erick Erickson
Have you tried taking your keyword field out of the “qf” param and adding it 
explicitly? As keyword:”ice cream”

Best,
Erick

> On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh  wrote:
> 
> Hi everybody,
> 
> I am using the edismax parser and have noticed a very specific behaviour
> with how sow=true (default) handles multiword keywords.
> 
> We have a field called 'keywords', which uses the general
> KeywordTokenizerFactory. There are also other text fields like title and
> description. etc.
> 
> When we index a document with a keyword "ice cream", for example, we know
> it gets indexed into that field as "ice cream".
> 
> However, at query time, I noticed that if we run an Edismax query:
> q=ice cream
> qf=keywords
> 
> I do not get that document back as a match. This is due to sow=true
> splitting the user's query and the final tokens not being present in the
> keywords field.
> 
> I was wondering what the best practise around this was? Some thoughts I
> have had:
> 
> 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice
> cream" -> "ice-cream"
> 2. Additionally index the separate words as keywords also. E.g. "ice cream"
> -> "ice cream", "ice", "cream". However this method will result in the loss
> of intent (q=ice would return this document).
> 3. Add a boost query which is an edismax query where we explicitly set
> sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000
> sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}*
> 
> Is there an industry practise solution to handle this type of problem? Keep
> in mind that the other text fields may also include these terms. E.g.
> title="This is ice cream", which would match the query. This specific
> problem affects the keywords field for the obvious reason that the indexing
> pipeline does not tokenize keywords.
> 
> Thank you for all your amazing help,
> 
> Regards,
> 
> Ash
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> . ***
> ** Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> 
>  
>   
>     
>   
> 
> 
> 
> 
> 
>