We use lucene in our product since version 1.2.

I have developed a new  Bigramm stemmer and would like to get a
suggestion how to implement the needed scorer for it.

 

Using a Boolean query with a slope I get most of the time  the correct
documents.

 

For example: The Bigramm split for "document"  is

 

do oc cu um me en nt

 

If a user searches the misspelled "documnts" 

 

I use a Boolean query with a slope depending on the length of the search
term.

This works quite well , as 

do oc cu um mn nt ts

gives 6 correct terms.

 

But I want to implement in addition that terms which follow each other
in the indexed doc in the same order get a higher score.

In this case we have 5 terms in the correct order which should give to
the doc a boost of 4 (relatively spoken).

 

What type of query should I base the  development of my scorer on?

 

 

Regards

Uwe Goetzke

development manager

________________________________________________

 

Healy Hudson GmbH  

Nelkenstrasse 43

67691 Hochspeyer

  

mailto:[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> 

http://www.healy-hudson.com <http://www.healy-hudson.com/> 

 


-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076

Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger sind, 
durfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie diese 
Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte umgehend 
mit, indem Sie diese Email an den Absender zuruckschicken. Bitte loschen Sie 
danach diese Email.
This email is confidential. If you are not the intended recipient, you must not 
disclose or use this information contained in it. If you have received this 
email in error please tell us immediately by return email and delete the 
document.

Reply via email to