subject:"struggling with solr.WordDelimiterFilterFactory and periods . or dots"

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-10 Thread geeky2

hello,


Or does your field in schema.xml have anything like
autoGeneratePhraseQueries=true in it?


there is no reference to this in our production schema.

this is extremely confusing.

i am not completely clear on the issue?

reviewing our previous messages - it looks like the data is being tokenized
correctly according to the analysis page and output from Luke.

it also looks like the definition of the field and field type is correct in
the schema.xml

it also looks like there is no errant data (quotes) being introduced in to
the query string submitted to solr:

example:

*http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select?indent=onversion=2.2q=itemNo%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=*

*so - does the real issue reside in HOW the query is being contructed /
parsed ???

and if so - what drives this query to become a MultiPhraseQuery with
embedded quotes 
*

lst name=debugstr name=rawquerystringitemNo:BP21UAA
/strstr name=querystringitemNo:BP21UAA
/strstr name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa
bp21uaa))/strstr name=parsedquery_toStringitemNo:bp 21 (uaa
bp21uaa)/str

please note - i also mocked up a simple test on my personal linux box - just
using the solr 3.5 distro (we are using 3.3.0 on our production box under
centOS)

i was able to get a simple test to work and yes - my query does look
different

output from my simple mock up on my personal box:

*http://localhost:8983/solr/select?indent=onversion=2.2q=manu%3ABP21UAAfq=start=0rows=10fl=*%2Cscoreqt=wt=debugQuery=onexplainOther=hl.fl=*

lst name=debugstr name=rawquerystringmanu:BP21UAA/strstr
name=querystringmanu:BP21UAA/strstr name=parsedquerymanu:bp manu:21
manu:uaa manu:bp21uaa/strstr name=parsedquery_toStringmanu:bp manu:21
manu:uaa manu:bp21uaa/strlst name=explain

schema.xml

fieldType name=text_en_splitting class=solr.TextField
positionIncrementGap=100analyzer type=indextokenizer
class=solr.WhitespaceTokenizerFactory/filter
class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt
enablePositionIncrements=true/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=1 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter
class=solr.PorterStemFilterFactory//analyzeranalyzer
type=querytokenizer class=solr.WhitespaceTokenizerFactory/filter
class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true
expand=true/filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_en.txt enablePositionIncrements=true/filter
class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=1 splitOnCaseChange=1/filter
class=solr.LowerCaseFilterFactory/filter
class=solr.KeywordMarkerFilterFactory protected=protwords.txt/filter
class=solr.PorterStemFilterFactory//analyzer/fieldType

field name=manu type=text_en_splitting indexed=true stored=true
omitNorms=true/

any suggestions would be greatly appreciated.

mark




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3733486.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-09 Thread Erick Erickson

OK, first question is why are you searching on two different values?
Is that intentional? If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as MultiPhraseQuer?
Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

Because this works fine for me, using your schema definition and
without using quotes. I get, however, this as the parsed query:
eoe:b eoe:12 eoe:0123 eoe:120123 eoe:b120123
not a phrase in sight.

If I *do* put quotes around the version without the period, I get
no results returned and a MultiPhraseQuery.

Best
Erick



On Wed, Feb 8, 2012 at 11:54 AM, geeky2 gee...@hotmail.com wrote:
 hello,

 thanks for sticking with me on this ...very frustrating

 ok - i did perform the query with the debug parms using two scenarios:

 1) a successful search (where i insert the period / dot) in to the itemNo
 field and the search returns a document.

 itemNo:BP2.1UAA

 http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAAversion=2.2start=0rows=10indent=ondebugQuery=on

 results from debug

 ?xml version=1.0 encoding=UTF-8?
 response

 lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
    str name=indenton/str
    str name=rows10/str

    str name=version2.2/str
    str name=debugQueryon/str
    str name=start0/str
    str name=qitemNo:BP2.1UAA/str
  /lst
 /lst
 result name=response numFound=1 start=0
  doc

    arr name=brandstrPHILIPS/str/arr
    str name=groupId0333500/str
    str name=id0333500,1549  ,BP2.1UAA                           /str
    str name=itemDescPLASMA TELEVISION/str
    str name=itemNoBP2.1UAA                           /str
    int name=itemType2/int

    arr name=modelstrBP2.1UAA                           /str/arr
    arr name=productTypestrPlasma Television^/str/arr
    int name=rankNo0/int
    str name=supplierId1549  /str
  /doc
 /result
 lst name=debug
  str name=rawquerystringitemNo:BP2.1UAA/str

  str name=querystringitemNo:BP2.1UAA/str
  str name=parsedqueryMultiPhraseQuery(itemNo:bp 2 (1 21) (uaa
 bp21uaa))/str
  str name=parsedquery_toStringitemNo:bp 2 (1 21) (uaa bp21uaa)/str
  lst name=explain
    str name=0333500,1549  ,BP2.1UAA                           
 22.539911 = (MATCH) weight(itemNo:bp 2 (1 21) (uaa bp21uaa) in 134993),
 product of:
  0.9994 = queryWeight(itemNo:bp 2 (1 21) (uaa bp21uaa)), product of:
    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
    0.02218287 = queryNorm
  22.539913 = (MATCH) fieldWeight(itemNo:bp 2 (1 21) (uaa bp21uaa) in
 134993), product of:
    1.0 = tf(phraseFreq=1.0)
    45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
    0.5 = fieldNorm(field=itemNo, doc=134993)
 /str
  /lst

  str name=QParserLuceneQParser/str
  lst name=timing
    double name=time1.0/double
    lst name=prepare
      double name=time0.0/double
      lst name=org.apache.solr.handler.component.QueryComponent
        double name=time0.0/double

      /lst
      lst name=org.apache.solr.handler.component.FacetComponent
        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.MoreLikeThisComponent
        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.HighlightComponent

        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.StatsComponent
        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.DebugComponent
        double name=time0.0/double

      /lst
    /lst
    lst name=process
      double name=time1.0/double
      lst name=org.apache.solr.handler.component.QueryComponent
        double name=time1.0/double
      /lst
      lst name=org.apache.solr.handler.component.FacetComponent

        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.MoreLikeThisComponent
        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.HighlightComponent
        double name=time0.0/double

      /lst
      lst name=org.apache.solr.handler.component.StatsComponent
        double name=time0.0/double
      /lst
      lst name=org.apache.solr.handler.component.DebugComponent
        double name=time0.0/double
      /lst
    /lst

  /lst
 /lst
 /response







 2) a NON-successful search (where i do NOT insert a period / dot) in to the
 itemNo field and the search does NOT return a document

  itemNo:BP21UAA

 http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAAversion=2.2start=0rows=10indent=ondebugQuery=on

 ?xml version=1.0 encoding=UTF-8?
 response

 lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
    str

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-09 Thread geeky2

OK, first question is why are you searching on two different values?
Is that intentional?

yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.

example:

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?

If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as MultiPhraseQuer?

sorry - i did not know i was doing this or how it happened - it was not
intentional and i did not notice this until your posting. i am not sure of
the implications related to this or what it means to have something as a
MultiPhraseQuery.

Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread. i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

it seems that this is the crux of the issue - but i am not sure how to
determine what is manifesting the quotes? as previously stated - the quotes
are not being entered via the url - they are pasted (in this message thread)
exactly as i pulled them from the browser.

thank you,
mark

--
View this message in context:
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3730070.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-09 Thread Erick Erickson

Hmmm, Try looking at either anything you've done in solrconfig.xml
where to the request handler (probably called search) with
default=true set.

Or does your field in schema.xml have anything like
autoGeneratePhraseQueries=true in it?

Best
Erick

On Thu, Feb 9, 2012 at 12:02 PM, geeky2 gee...@hotmail.com wrote:

OK, first question is why are you searching on two different values?
Is that intentional?

yes - our users have to be able to locate a part or model number (that may
or may not have periods in that number) even if they do NOT enter the number
with the embedded periods.

example:

actual part number in our database is BP2.1UAA

however the user needs to be able to search on BP21UAA and find that part.

there are business reason why a user may see something different in the
field then is actually in the database.

does this make sense?

If I'm reading your problem right, you should
be able to get/not get any response just by toggling whether the
period is in the search URL, right?

yes - simply put - the user MUST get a hit on the above mentioned part if
they enter BP21UAA or BP2.1UAA.

But assuming that's not the problem, there's something you're
not telling us. In particular, why is this parsing as MultiPhraseQuer?

Are you putting quotes in somehow, either through the URL or by
something in your solrconfig.xml?

i did not use quotes in the url - i cut and pasted the urls for my tests in
the message thread. i do not see quotes as part of the url in my previous
post.

what would i be looking for in the solrconfig.xml file that would force the
MultiPhraseQuery?

thank you,
mark

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread Erick Erickson

Hmmm, seems OK. Did you re-index after any
schema changes?

You'll learn to love admin/analysis for questions like this,
that page should show you what the actual tokenization
results are, make sure to click the verbose check boxes.

Best
Erick

On Tue, Feb 7, 2012 at 10:52 PM, geeky2 gee...@hotmail.com wrote:
 hello all,

 i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
 indicated in the solr book (Smiley) on page 54.

 the example in the books reads like this:


 Here is an example exercising all options:
 WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b
 

 essentially - i have the same requirement with embedded periods and need to
 return a successful search on a field, even if the user does NOT enter the
 period.

 i have a field, itemNo that can contain periods ..

 example content in the itemNo field:

 B12.0123

 when the user searches on this field, they need to be able to enter an
 itemNo without the period, and still find the item.

 example:

 user enters: B120123 and a document is returned with B12.0123.


 unfortunately, the search will NOT return the appropriate document, if the
 user enters B120123.

 however - the search does work if the user enters B12 0123 (a space in place
 of the period).

 can someone help me understand what is missing from my configuration?


 this is snipped from my schema.xml file


  fields
     ...
    field name=itemNo type=text indexed=true stored=true/
     ...
  /fields




    fieldType name=text class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        *filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/*
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
        filter class=solr.PorterStemFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2

hello,

thank you for the reply.

yes - i did re-index after the changes to the schema.

also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.

here is what i did:

in the Field value (Index) box - i placed this: BP2.1UAA

in the Field value (Query) box - i placed this: BP21UAA

then after hitting the Analyze button - i see the following:

Under Index Analyzer for: 

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3   4
term text   BP  2   1   UAA
21  BP21UAA

Under Query Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see 

position1   2   3
term text   BP  21  UAA
BP21UAA

the above information leads me to believe that i should have BP21UAA as an
indexed term generated from the BP2.1UAA value coming from the database.

also - the query analysis lead me to believe that i should find a document
when i search on BP21UAA in the itemNo field

do i have this correct

am i missing something here?

i am still unable to get a hit when i search on BP21UAA in the itemNo field.

thank you,
mark

--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread Erick Erickson

Hmmm, that all looks correct, from the output you pasted I'd expect
you to be finding the doc.

So next thing: add debugQuery=on to your query and look at
the debug information after the list of documents, particularly
the parsedQuery bit. Are you searching against the fields you
think you are? If you don't specify a field, Solr uses the default
defined in schema.xml.

Next, look at your actual index using either Luke or the TemsComponent
to see what's actually *in* your index rather than what you *think* is. I
can't tell you how many times I've made the wrong assumptions.

My guess would be that you aren't searching the fields you think you are...

Best
Erick

On Wed, Feb 8, 2012 at 9:06 AM, geeky2 gee...@hotmail.com wrote:
hello,

thank you for the reply.

yes - i did re-index after the changes to the schema.

also - thank you for the direction on using the analyzer - but i am not sure
if i am interpreting the feedback from the analyzer correctly.

here is what i did:

in the Field value (Index) box - i placed this: BP2.1UAA

in the Field value (Query) box - i placed this: BP21UAA

then after hitting the Analyze button - i see the following:

Under Index Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see

position 1 2 3 4
term text BP 2 1 UAA
21 BP21UAA

Under Query Analyzer for:

org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1,
generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_33,
generateWordParts=1, catenateAll=1, catenateNumbers=1}

i see

position 1 2 3
term text BP 21 UAA
BP21UAA

the above information leads me to believe that i should have BP21UAA as an
indexed term generated from the BP2.1UAA value coming from the database.

also - the query analysis lead me to believe that i should find a document
when i search on BP21UAA in the itemNo field

do i have this correct

am i missing something here?

i am still unable to get a hit when i search on BP21UAA in the itemNo field.

thank you,
mark

--
View this message in context:
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3726021.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-08 Thread geeky2

hello,

thanks for sticking with me on this ...very frustrating 

ok - i did perform the query with the debug parms using two scenarios:

1) a successful search (where i insert the period / dot) in to the itemNo
field and the search returns a document.

itemNo:BP2.1UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP2.1UAAversion=2.2start=0rows=10indent=ondebugQuery=on

results from debug

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=indenton/str
str name=rows10/str

str name=version2.2/str
str name=debugQueryon/str
str name=start0/str
str name=qitemNo:BP2.1UAA/str
  /lst
/lst
result name=response numFound=1 start=0
  doc

arr name=brandstrPHILIPS/str/arr
str name=groupId0333500/str
str name=id0333500,1549  ,BP2.1UAA   /str
str name=itemDescPLASMA TELEVISION/str
str name=itemNoBP2.1UAA   /str
int name=itemType2/int

arr name=modelstrBP2.1UAA   /str/arr
arr name=productTypestrPlasma Television^/str/arr
int name=rankNo0/int
str name=supplierId1549  /str
  /doc
/result
lst name=debug
  str name=rawquerystringitemNo:BP2.1UAA/str

  str name=querystringitemNo:BP2.1UAA/str
  str name=parsedqueryMultiPhraseQuery(itemNo:bp 2 (1 21) (uaa
bp21uaa))/str
  str name=parsedquery_toStringitemNo:bp 2 (1 21) (uaa bp21uaa)/str
  lst name=explain
str name=0333500,1549  ,BP2.1UAA   
22.539911 = (MATCH) weight(itemNo:bp 2 (1 21) (uaa bp21uaa) in 134993),
product of:
  0.9994 = queryWeight(itemNo:bp 2 (1 21) (uaa bp21uaa)), product of:
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.02218287 = queryNorm
  22.539913 = (MATCH) fieldWeight(itemNo:bp 2 (1 21) (uaa bp21uaa) in
134993), product of:
1.0 = tf(phraseFreq=1.0)
45.079826 = idf(itemNo: bp=829 2=29303 1=43943 21=6716 uaa=32 bp21uaa=1)
0.5 = fieldNorm(field=itemNo, doc=134993)
/str
  /lst

  str name=QParserLuceneQParser/str
  lst name=timing
double name=time1.0/double
lst name=prepare
  double name=time0.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double

  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent

double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double

  /lst
/lst
lst name=process
  double name=time1.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent

double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double

  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
  /lst
/lst

  /lst
/lst
/response







2) a NON-successful search (where i do NOT insert a period / dot) in to the
itemNo field and the search does NOT return a document

 itemNo:BP21UAA

http://hfsthssolr1.intra.searshc.com:8180/solrpartscat/core1/select/?q=itemNo%3ABP21UAAversion=2.2start=0rows=10indent=ondebugQuery=on

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  lst name=params
str name=indenton/str
str name=rows10/str

str name=version2.2/str
str name=debugQueryon/str
str name=start0/str
str name=qitemNo:BP21UAA/str
  /lst
/lst
result name=response numFound=0 start=0/
lst name=debug

  str name=rawquerystringitemNo:BP21UAA/str
  str name=querystringitemNo:BP21UAA/str
  str name=parsedqueryMultiPhraseQuery(itemNo:bp 21 (uaa
bp21uaa))/str
  str name=parsedquery_toStringitemNo:bp 21 (uaa bp21uaa)/str
  lst name=explain/
  str name=QParserLuceneQParser/str

  lst name=timing
double name=time1.0/double
lst name=prepare
  double name=time1.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
  /lst

  lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
  /lst
  lst

struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-07 Thread geeky2

hello all,

i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
indicated in the solr book (Smiley) on page 54.

the example in the books reads like this:


Here is an example exercising all options:
WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b


essentially - i have the same requirement with embedded periods and need to
return a successful search on a field, even if the user does NOT enter the
period.

i have a field, itemNo that can contain periods ..

example content in the itemNo field:

B12.0123

when the user searches on this field, they need to be able to enter an
itemNo without the period, and still find the item.

example:

user enters: B120123 and a document is returned with B12.0123.


unfortunately, the search will NOT return the appropriate document, if the
user enters B120123.

however - the search does work if the user enters B12 0123 (a space in place
of the period).

can someone help me understand what is missing from my configuration?


this is snipped from my schema.xml file


  fields
 ...
field name=itemNo type=text indexed=true stored=true/
 ...
  /fields




fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
*filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/*
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

Re: struggling with solr.WordDelimiterFilterFactory and periods . or dots

struggling with solr.WordDelimiterFilterFactory and periods . or dots

9 matches

Site Navigation

Mail list logo

Footer information