Hi all,

(scroll to bottom for question)

I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the English dictionary, each document containing a single search field holding the value after it is preprocessed using the following analyzer def (in our own dsl syntax, which gets transformed to java):

analyzer soundslike{
    tokenizer = KeywordTokenizer
    tokenfilter = LowerCaseFilter
    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}

I can run the web app and I get results that indeed (in some way) sound like the original query term.

But what confuses me is the ranking of the results, knowing that I set the inject param to true. If I search for the query term 'compete', the parsed query becomes '(value:KMPT value:compete)', and therefore I expect the word 'compete' to be ranked highest in the list than any other word.... but this wasn't the case.

Looking further at the explanation of results, I saw that the term 'compete' in the parsed query is totally absent, and only the phonetic encoding seems affect the ranking:

 * COMPETITOR
     o 4.368826 = (MATCH) sum of:
         + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
             # 0.52838135 = queryWeight(value:KMPT), product of:
                 * 8.26832 = idf(docFreq=150, maxDocs=216555)
                 * 0.063904315 = queryNorm
             # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
               product of:
                 * 1.0 = tf(termFreq(value:KMPT)=1)
                 * 8.26832 = idf(docFreq=150, maxDocs=216555)
                 * 1.0 = fieldNorm(field=value, doc=3174)

The next thing I did was running our friend Luke. In Luke, I opened the documents tab, and started iterating over some terms for the field 'value' until I found 'compete'. When I hit 'Show All Docs', the search tab opens and it displays the one and only document holding this value (i.e. the document representing the word 'compete'). It shows the query: 'value:compete '. Then, when I hit the search button again (query is still 'value:compete '), it says that there are no results !?

Probably, the 'Show All Docs' button does something different than performing a query using the search tab in Luke.

Q: Can somebody explain why the injected original terms seem to get ignored at query time? Or may it be related to the name of the search field ('value'), or something else?

We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).

-Elmer


Reply via email to