Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-26 Thread Erick Erickson
This confuses a lot of people. The difference is at the top-level parser, way
before it gets to the analysis chain.

"Inventor-template"

comes out of the top-level parser as
a single token. From there it goes through edismax etc. So it's a single
token spread across your
fields by edismax. It's only during the field analysis that it's broken
into two tokens.

"Inventor template" is parsed as two distinct tokens and fed to edismax as
two tokens where
they're spread across your fields as a pair of words.

Best,
Erick




On Fri, Aug 26, 2016 at 8:09 AM, shamik <sham...@gmail.com> wrote:

> Anyone ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Inventor-template-vs-Inventor-template-issue-
> with-hyphen-tp4293357p4293489.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-26 Thread shamik
Anyone ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inventor-template-vs-Inventor-template-issue-with-hyphen-tp4293357p4293489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread shamik
Thanks Erick. I did look into the analyser tool and debug query and posted
the results in my post. WDF is correctly stripping off the "-" from
Inventor-template, both terms are getting broken down to "inventor templat".
But not sure why the query construct is different during query time. Here's
parsed query:

*Inventor-template*


(+DisjunctionMaxQuery(((+CommandSrch:inventor +CommandSrch:templat) |
text:"inventor templat"^1.5 | Description:"inventor templat"^2.0 |
title:"inventor templat"^3.5 | keywords:"inventor templat"^1.2)~0.01)
Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147216960),date(PublishDate)))+1.0)))/no_coord



+((+CommandSrch:inventor +CommandSrch:templat) | text:"inventor templat"^1.5
| Description:"inventor templat"^2.0 | title:"inventor templat"^3.5 |
keywords:"inventor templat"^1.2)~0.01 Source2:sfdcarticles^9.0
Source2:downloads^5.0 
1.0/(3.16E-11*float(ms(const(147216960),date(PublishDate)))+1.0)


*Inventor template*


(+(+DisjunctionMaxQuery((CommandSrch:inventor | text:inventor^1.5 |
Description:inventor^2.0 | title:inventor^3.5 | keywords:inventor^1.2)~0.01)
+DisjunctionMaxQuery((CommandSrch:templat | text:templat^1.5 |
Description:templat^2.0 | title:templat^3.5 | keywords:templat^1.2)~0.01))
Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147216960),date(PublishDate)))+1.0)))/no_coord



+(+(CommandSrch:inventor | text:inventor^1.5 | Description:inventor^2.0 |
title:inventor^3.5 | keywords:inventor^1.2)~0.01 +(CommandSrch:templat |
text:templat^1.5 | Description:templat^2.0 | title:templat^3.5 |
keywords:templat^1.2)~0.01) Source2:sfdcarticles^9.0 Source2:downloads^5.0 
1.0/(3.16E-11*float(ms(const(147216960),date(PublishDate)))+1.0)


The part I'm confused is why the two queries are being interpreted
differently ?

Thanks,
Shamik



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Inventor-template-vs-Inventor-template-issue-with-hyphen-tp4293357p4293380.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread Erick Erickson
Look at your admin/analysis page. Worddelimitetfilterfactory breaks on non
alpha-num. Also, adding =query will show you the parsed form of the
query and that'll help

On Aug 25, 2016 4:41 PM, "Shamik Bandopadhyay"  wrote:

Hi,

  I'm trying to figure out search behaviour related to similar terms, one
with and without the hyphen. Both of them are generating a different result
set , the search without the hyphen is bringing back more result compared
to the other. Here's the fieldtype definition :






















If I run the search term through the analyzer, the final indexed data for
both term (hyphen and without) results in  --> *inventor templat*

I was under the impression that based on my analyzers, both search term
will produce same result.

Here's the output from debug and splainer.

*Inventor-template*
*-*

(+DisjunctionMaxQuery(((+CommandSrch:inventor
+CommandSrch:templat) | text:"inventor templat"^1.5 | Description:"inventor
templat"^2.0 | title:"inventor templat"^3.5 | keywords:"inventor
templat"^1.2)~0.01) Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147208320),
date(PublishDate)))+1.0)))/no_coord

+((+CommandSrch:inventor
+CommandSrch:templat) | text:"inventor templat"^1.5 | Description:"inventor
templat"^2.0 | title:"inventor templat"^3.5 | keywords:"inventor
templat"^1.2)~0.01
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)

>From Splainer:

10.974786 Sum of the following:
 9.203462 Dismax (max plus:0.01 times others)
   9.198681 title:"inventor templat"

   0.4781131 text:"inventor templat"

 1.7644342 Source2:sfdcarticles

 0.006889837 1.0/(3.16E-11*float(ms(const(147208320),date(
PublishDate)))+1.0)


*Inventor template*
*--*

(+(+DisjunctionMaxQuery((CommandSrch:inventor |
text:inventor^1.5 | Description:inventor^2.0 | title:inventor^3.5 |
keywords:inventor^1.2)~0.01) +DisjunctionMaxQuery((CommandSrch:templat |
text:templat^1.5 | Description:templat^2.0 | title:templat^3.5 |
keywords:templat^1.2)~0.01)) Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147208320),
date(PublishDate)))+1.0)))/no_coord

+(+(CommandSrch:inventor |
text:inventor^1.5 | Description:inventor^2.0 | title:inventor^3.5 |
keywords:inventor^1.2)~0.01 +(CommandSrch:templat | text:templat^1.5 |
Description:templat^2.0 | title:templat^3.5 | keywords:templat^1.2)~0.01)
Source2:sfdcarticles^9.0 Source2:downloads^5.0
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)

>From splainer :

9.915069 Sum of the following:
 5.03947 Dismax (max plus:0.01 times others)
   5.038846 title:templat

   0.062400598 text:templat

 4.767776 Dismax (max plus:0.01 times others)
   4.7674117 title:inventor

   0.03642158 text:inventor

 0.098686054 Source2:CloudHelp

 0.009136423
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)


I'm using edismax.


Just wondering what I'm missing here. Any help will be appreciated.

Regards,
Shamik


Inventor-template vs Inventor template - issue with hyphen

2016-08-25 Thread Shamik Bandopadhyay
Hi,

  I'm trying to figure out search behaviour related to similar terms, one
with and without the hyphen. Both of them are generating a different result
set , the search without the hyphen is bringing back more result compared
to the other. Here's the fieldtype definition :






















If I run the search term through the analyzer, the final indexed data for
both term (hyphen and without) results in  --> *inventor templat*

I was under the impression that based on my analyzers, both search term
will produce same result.

Here's the output from debug and splainer.

*Inventor-template*
*-*

(+DisjunctionMaxQuery(((+CommandSrch:inventor
+CommandSrch:templat) | text:"inventor templat"^1.5 | Description:"inventor
templat"^2.0 | title:"inventor templat"^3.5 | keywords:"inventor
templat"^1.2)~0.01) Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)))/no_coord

+((+CommandSrch:inventor
+CommandSrch:templat) | text:"inventor templat"^1.5 | Description:"inventor
templat"^2.0 | title:"inventor templat"^3.5 | keywords:"inventor
templat"^1.2)~0.01
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)

>From Splainer:

10.974786 Sum of the following:
 9.203462 Dismax (max plus:0.01 times others)
   9.198681 title:"inventor templat"

   0.4781131 text:"inventor templat"

 1.7644342 Source2:sfdcarticles

 0.006889837 
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)


*Inventor template*
*--*

(+(+DisjunctionMaxQuery((CommandSrch:inventor |
text:inventor^1.5 | Description:inventor^2.0 | title:inventor^3.5 |
keywords:inventor^1.2)~0.01) +DisjunctionMaxQuery((CommandSrch:templat |
text:templat^1.5 | Description:templat^2.0 | title:templat^3.5 |
keywords:templat^1.2)~0.01)) Source2:sfdcarticles^9.0 Source2:downloads^5.0
FunctionQuery(1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)))/no_coord

+(+(CommandSrch:inventor |
text:inventor^1.5 | Description:inventor^2.0 | title:inventor^3.5 |
keywords:inventor^1.2)~0.01 +(CommandSrch:templat | text:templat^1.5 |
Description:templat^2.0 | title:templat^3.5 | keywords:templat^1.2)~0.01)
Source2:sfdcarticles^9.0 Source2:downloads^5.0
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)

>From splainer :

9.915069 Sum of the following:
 5.03947 Dismax (max plus:0.01 times others)
   5.038846 title:templat

   0.062400598 text:templat

 4.767776 Dismax (max plus:0.01 times others)
   4.7674117 title:inventor

   0.03642158 text:inventor

 0.098686054 Source2:CloudHelp

 0.009136423
1.0/(3.16E-11*float(ms(const(147208320),date(PublishDate)))+1.0)


I'm using edismax.


Just wondering what I'm missing here. Any help will be appreciated.

Regards,
Shamik