Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-10 Thread geeky2
hello,

>>
Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?
<<

there is no reference to this in our production schema.

this is extremely confusing.

i am not completely clear on the issue?

reviewing our previous messages - it looks like the data is being tokenized
correctly according to the analysis page and output from Luke.

it also looks like the definition of the field and field type is correct in
the schema.xml

it also looks like there is no errant data (quotes) being introduced in to
the query string submitted to solr:

example:

*http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=on&version=2.2&q=itemNo%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

*so - does the real issue reside in HOW the query is being contructed /
parsed ???

and if so - what drives this query to become a MultiPhraseQuery with
embedded quotes 
*

itemNo:BP21UAA
itemNo:BP21UAA
MultiPhraseQuery(itemNo:"bp 21 (uaa
bp21uaa)")itemNo:"bp 21 (uaa
bp21uaa)"

please note - i also mocked up a simple test on my personal linux box - just
using the solr 3.5 distro (we are using 3.3.0 on our production box under
centOS)

i was able to get a simple test to work and yes - my query does look
different

output from my simple mock up on my personal box:

*http://localhost:8983/solr/select?indent=on&version=2.2&q=manu%3ABP21UAA&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&explainOther=&hl.fl=*

manu:BP21UAAmanu:BP21UAAmanu:bp manu:21
manu:uaa manu:bp21uaamanu:bp manu:21
manu:uaa manu:bp21uaa

schema.xml





any suggestions would be greatly appreciated.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-09 Thread Erick Erickson
Hmmm, Try looking at either anything you've done in solrconfig.xml
where to the request handler (probably called "search") with
default="true" set.

Or does your field in schema.xml have anything like
autoGeneratePhraseQueries="true" in it?

Best
Erick

On Thu, Feb 9, 2012 at 12:02 PM, geeky2  wrote:
>
>>>
> OK, first question is why are you searching on two different values?
> Is that intentional?
> <<
>
> yes - our users have to be able to locate a part or model number (that may
> or may not have periods in that number) even if they do NOT enter the number
> with the embedded periods.
>
> example:
>
> actual part number in our database is BP2.1UAA
>
> however the user needs to be able to search on BP21UAA and find that part.
>
> there are business reason why a user may see something different in the
> field then is actually in the database.
>
> does this make sense?
>
>
>
>>>
> If I'm reading your problem right, you should
> be able to get/not get any response just by toggling whether the
> period is in the search URL, right?
> <<
>
> yes - simply put - the user MUST get a hit on the above mentioned part if
> they enter BP21UAA or BP2.1UAA.
>
>>>
> But assuming that's not the problem, there's something you're
> not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
> <<
>
> sorry - i did not know i was doing this or how it happened - it was not
> intentional and i did not notice this until your posting.  i am not sure of
> the implications related to this or what it means to have something as a
> MultiPhraseQuery.
>
>>>
> Are you putting quotes in somehow, either through the URL or by
> something in your solrconfig.xml?
> <<
>
> i did not use quotes in the url - i cut and pasted the urls for my tests in
> the message thread.  i do not see quotes as part of the url in my previous
> post.
>
> what would i be looking for in the solrconfig.xml file that would force the
> MultiPhraseQuery?
>
> it seems that this is the crux of the issue - but i am not sure how to
> determine what is manifesting the quotes?  as previously stated - the quotes
> are not being entered via the url - they are pasted (in this message thread)
> exactly as i pulled them from the browser.
>
> thank you,
> mark
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-09 Thread geeky2

>>
OK, first question is why are you searching on two different values?
Is that intentional? 
<<

yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.  

example: 

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?



>>
If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right? 
<<

yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.

>>
But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
<<

sorry - i did not know i was doing this or how it happened - it was not
intentional and i did not notice this until your posting.  i am not sure of
the implications related to this or what it means to have something as a
MultiPhraseQuery.

>>
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?
<<

i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread.  i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

it seems that this is the crux of the issue - but i am not sure how to
determine what is manifesting the quotes?  as previously stated - the quotes
are not being entered via the url - they are pasted (in this message thread)
exactly as i pulled them from the browser.

thank you,
mark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-09 Thread Erick Erickson
OK, first question is why are you searching on two different values?
Is that intentional? If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as "MultiPhraseQuer"?
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

Because this works fine for me, using your schema definition and
without using quotes. I get, however, this as the parsed query:
eoe:b eoe:12 eoe:0123 eoe:120123 eoe:b120123
not a phrase in sight.

If I *do* put quotes around the version without the period, I get
no results returned and a MultiPhraseQuery.

Best
Erick



On Wed, Feb 8, 2012 at 11:54 AM, geeky2  wrote:
> hello,
>
> thanks for sticking with me on this ...very frustrating
>
> ok - i did perform the query with the debug parms using two scenarios:
>
> 1) a successful search (where i insert the period / dot) in to the itemNo
> field and the search returns a document.
>
> itemNo:BP2.1UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> results from debug
>
> 
> 
>
> 
>  0
>  1
>  
>    on
>    10
>
>    2.2
>    on
>    0
>    itemNo:BP2.1UAA
>  
> 
> 
>  
>
>    PHILIPS
>    0333500
>    0333500,1549  ,BP2.1UAA                           
>    PLASMA TELEVISION
>    BP2.1UAA                           
>    2
>
>    BP2.1UAA                           
>    Plasma Television^
>    0
>    1549  
>  
> 
> 
>  itemNo:BP2.1UAA
>
>  itemNo:BP2.1UAA
>  MultiPhraseQuery(itemNo:"bp 2 (1 21) (uaa
> bp21uaa)")
>  itemNo:"bp 2 (1 21) (uaa bp21uaa)"
>  
>    
> 22.539911 = (MATCH) weight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in 134993),
> product of:
>  0.9994 = queryWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)"), product of:
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.02218287 = queryNorm
>  22.539913 = (MATCH) fieldWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in
> 134993), product of:
>    1.0 = tf(phraseFreq=1.0)
>    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
>    0.5 = fieldNorm(field=itemNo, doc=134993)
> 
>  
>
>  LuceneQParser
>  
>    1.0
>    
>      0.0
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>    
>    
>      1.0
>      
>        1.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>    
>
>  
> 
> 
>
>
>
>
>
>
>
> 2) a NON-successful search (where i do NOT insert a period / dot) in to the
> itemNo field and the search does NOT return a document
>
>  itemNo:BP21UAA
>
> http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on
>
> 
> 
>
> 
>  0
>  1
>  
>    on
>    10
>
>    2.2
>    on
>    0
>    itemNo:BP21UAA
>  
> 
> 
> 
>
>  itemNo:BP21UAA
>  itemNo:BP21UAA
>  MultiPhraseQuery(itemNo:"bp 21 (uaa
> bp21uaa)")
>  itemNo:"bp 21 (uaa bp21uaa)"
>  
>  LuceneQParser
>
>  
>    1.0
>    
>      1.0
>      
>        1.0
>      
>
>      
>        0.0
>      
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>    
>
>    
>      0.0
>      
>        0.0
>      
>      
>        0.0
>
>      
>      
>        0.0
>      
>      
>        0.0
>      
>      
>
>        0.0
>      
>      
>        0.0
>      
>    
>  
> 
>
> 
>
> the parsedquery part of the debug ouput looks like it DOES contain the term
> that i am entering for my search criteria on the itemNo field ??
>
> does this make sense?
>
> thank you,
> mark
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726614.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread geeky2
hello,

thanks for sticking with me on this ...very frustrating 

ok - i did perform the query with the debug parms using two scenarios:

1) a successful search (where i insert the period / dot) in to the itemNo
field and the search returns a document.

itemNo:BP2.1UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on

results from debug





  0
  1
  
on
10

2.2
on
0
itemNo:BP2.1UAA
  


  

PHILIPS
0333500
0333500,1549  ,BP2.1UAA   
PLASMA TELEVISION
BP2.1UAA   
2

BP2.1UAA   
Plasma Television^
0
1549  
  


  itemNo:BP2.1UAA

  itemNo:BP2.1UAA
  MultiPhraseQuery(itemNo:"bp 2 (1 21) (uaa
bp21uaa)")
  itemNo:"bp 2 (1 21) (uaa bp21uaa)"
  

22.539911 = (MATCH) weight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in 134993),
product of:
  0.9994 = queryWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)"), product of:
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.02218287 = queryNorm
  22.539913 = (MATCH) fieldWeight(itemNo:"bp 2 (1 21) (uaa bp21uaa)" in
134993), product of:
1.0 = tf(phraseFreq=1.0)
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.5 = fieldNorm(field=itemNo, doc=134993)

  

  LuceneQParser
  
1.0

  0.0
  
0.0

  
  
0.0
  
  
0.0
  
  

0.0
  
  
0.0
  
  
0.0

  


  1.0
  
1.0
  
  

0.0
  
  
0.0
  
  
0.0

  
  
0.0
  
  
0.0
  


  









2) a NON-successful search (where i do NOT insert a period / dot) in to the
itemNo field and the search does NOT return a document

 itemNo:BP21UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAA&version=2.2&start=0&rows=10&indent=on&debugQuery=on





  0
  1
  
on
10

2.2
on
0
itemNo:BP21UAA
  




  itemNo:BP21UAA
  itemNo:BP21UAA
  MultiPhraseQuery(itemNo:"bp 21 (uaa
bp21uaa)")
  itemNo:"bp 21 (uaa bp21uaa)"
  
  LuceneQParser

  
1.0

  1.0
  
1.0
  

  
0.0
  
  
0.0
  
  
0.0

  
  
0.0
  
  
0.0
  



  0.0
  
0.0
  
  
0.0

  
  
0.0
  
  
0.0
  
  

0.0
  
  
0.0
  

  




the parsedquery part of the debug ouput looks like it DOES contain the term
that i am entering for my search criteria on the itemNo field ??

does this make sense?

thank you,
mark



--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726614.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread Erick Erickson
Hmmm, that all looks correct, from the output you pasted I'd expect
you to be finding the doc.

So next thing: add &debugQuery=on to your query and look at
the debug information after the list of documents, particularly
the "parsedQuery" bit. Are you searching against the fields you
think you are? If you don't specify a field, Solr uses the default
defined in schema.xml.

Next, look at your actual index using either Luke or the TemsComponent
to see what's actually *in* your index rather than what you *think* is. I
can't tell you how many times I've made the wrong assumptions.

My guess would be that you aren't searching the fields you think you are...

Best
Erick

On Wed, Feb 8, 2012 at 9:06 AM, geeky2  wrote:
> hello,
>
> thank you for the reply.
>
> yes - i did re-index after the changes to the schema.
>
> also - thank you for the direction on using the analyzer - but i am not sure
> if i am interpreting the feedback from the analyzer correctly.
>
> here is what i did:
>
> in the Field value (Index) box - i placed this: BP2.1UAA
>
> in the Field value (Query) box - i placed this: BP21UAA
>
> then after hitting the Analyze button - i see the following:
>
> Under Index Analyzer for:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
> generateWordParts=1, catenateAll=1, catenateNumbers=1}
>
> i see
>
> position        1       2       3       4
> term text       BP      2       1       UAA
> 21      BP21UAA
>
> Under Query Analyzer for:
>
> org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
> generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
> generateWordParts=1, catenateAll=1, catenateNumbers=1}
>
> i see
>
> position        1       2       3
> term text       BP      21      UAA
> BP21UAA
>
> the above information leads me to believe that i "should" have BP21UAA as an
> indexed term generated from the BP2.1UAA value coming from the database.
>
> also - the query analysis lead me to believe that i "should" find a document
> when i search on BP21UAA in the itemNo field
>
> do i have this correct
>
> am i missing something here?
>
> i am still unable to get a hit when i search on BP21UAA in the itemNo field.
>
> thank you,
> mark
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread geeky2
hello,

thank you for the reply.

yes - i did re-index after the changes to the schema.

also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.

here is what i did:

in the Field value (Index) box - i placed this: BP2.1UAA

in the Field value (Query) box - i placed this: BP21UAA

then after hitting the Analyze button - i see the following:

Under Index Analyzer for: 

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3   4
term text   BP  2   1   UAA
21  BP21UAA

Under Query Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3
term text   BP  21  UAA
BP21UAA

the above information leads me to believe that i "should" have BP21UAA as an
indexed term generated from the BP2.1UAA value coming from the database.

also - the query analysis lead me to believe that i "should" find a document
when i search on BP21UAA in the itemNo field

do i have this correct

am i missing something here?

i am still unable to get a hit when i search on BP21UAA in the itemNo field.

thank you,
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-08 Thread Erick Erickson
Hmmm, seems OK. Did you re-index after any
schema changes?

You'll learn to love admin/analysis for questions like this,
that page should show you what the actual tokenization
results are, make sure to click the "verbose" check boxes.

Best
Erick

On Tue, Feb 7, 2012 at 10:52 PM, geeky2  wrote:
> hello all,
>
> i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
> indicated in the solr book (Smiley) on page 54.
>
> the example in the books reads like this:
>
>>>
> Here is an example exercising all options:
> WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
> <<
>
> essentially - i have the same requirement with embedded periods and need to
> return a successful search on a field, even if the user does NOT enter the
> period.
>
> i have a field, itemNo that can contain periods ".".
>
> example content in the itemNo field:
>
> B12.0123
>
> when the user searches on this field, they need to be able to enter an
> itemNo without the period, and still find the item.
>
> example:
>
> user enters: B120123 and a document is returned with B12.0123.
>
>
> unfortunately, the search will NOT return the appropriate document, if the
> user enters B120123.
>
> however - the search does work if the user enters B12 0123 (a space in place
> of the period).
>
> can someone help me understand what is missing from my configuration?
>
>
> this is snipped from my schema.xml file
>
>
>  
>     ...
>    
>     ...
>  
>
>
>
>
>     positionIncrementGap="100">
>      
>        
>         ignoreCase="true" expand="true"/>
>         words="stopwords.txt"/>
>        * generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>*
>        
>         protected="protwords.txt"/>
>        
>        
>      
>      
>        
>         words="stopwords.txt"/>
>         generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>        
>         protected="protwords.txt"/>
>        
>        
>      
>    
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
> Sent from the Solr - User mailing list archive at Nabble.com.


struggling with solr.WordDelimiterFilterFactory and periods "." or dots

2012-02-07 Thread geeky2
hello all,

i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
indicated in the solr book (Smiley) on page 54.

the example in the books reads like this:

>>
Here is an example exercising all options:
WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
<<

essentially - i have the same requirement with embedded periods and need to
return a successful search on a field, even if the user does NOT enter the
period.

i have a field, itemNo that can contain periods ".".

example content in the itemNo field:

B12.0123

when the user searches on this field, they need to be able to enter an
itemNo without the period, and still find the item.

example:

user enters: B120123 and a document is returned with B12.0123.


unfortunately, the search will NOT return the appropriate document, if the
user enters B120123.

however - the search does work if the user enters B12 0123 (a space in place
of the period).

can someone help me understand what is missing from my configuration?


this is snipped from my schema.xml file


  
 ...

 ...
  





  



**




  
  







  





--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
Sent from the Solr - User mailing list archive at Nabble.com.