Uwe, I don't have the answer to your main question, but will point you to the ngram set of tokenizers in Lucene's contrib/, in case you want to use that instead of maintaining your own bi-gram tokenizer.
Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share ----- Original Message ---- From: Uwe Goetzke <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, October 18, 2007 9:07:10 AM Subject: Scoring algorithm suggestion? We use lucene in our product since version 1.2. I have developed a new Bigramm stemmer and would like to get a suggestion how to implement the needed scorer for it. Using a Boolean query with a slope I get most of the time the correct documents. For example: The Bigramm split for "document" is do oc cu um me en nt If a user searches the misspelled "documnts" I use a Boolean query with a slope depending on the length of the search term. This works quite well , as do oc cu um mn nt ts gives 6 correct terms. But I want to implement in addition that terms which follow each other in the indexed doc in the same order get a higher score. In this case we have 5 terms in the correct order which should give to the doc a boost of 4 (relatively spoken). What type of query should I base the development of my scorer on? Regards Uwe Goetzke development manager ________________________________________________ Healy Hudson GmbH Nelkenstrasse 43 67691 Hochspeyer mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> http://www.healy-hudson.com <http://www.healy-hudson.com/> ----------------------------------------------------------------------- Healy Hudson GmbH - D-55252 Mainz Kastel Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076 Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigte Empfanger sind, durfen Sie die Informationen nicht offen legen oder benutzen. Wenn Sie diese Email durch einen Fehler bekommen haben, teilen Sie uns dies bitte umgehend mit, indem Sie diese Email an den Absender zuruckschicken. Bitte loschen Sie danach diese Email. This email is confidential. If you are not the intended recipient, you must not disclose or use this information contained in it. If you have received this email in error please tell us immediately by return email and delete the document. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]