Hi Ahmed, That works! Still I do not understand how that staff working. I just know that analysed cut an indexed text into tokens. But I do not know how the matching is done.
Do you recommend and good book to read. I prefer something with less maths and more examples? The only I found is free "An Introduction to Information Retrieval" but I has lot of maths I do not understand. Best regards, Jacek On 8 June 2017 at 19:36, Ahmet Arslan <iori...@yahoo.com.invalid> wrote: > Hi, > You can completely ban within-a-word search by simply using > WhitespaceTokenizer for example.By the way, it is all about how you > tokenize/analyze your text. Once you decided, you can create a two versions > of a single field using different analysers.This allows you to assign > different weights to those field at query time. > Ahmet > > > On Thursday, June 8, 2017, 2:56:37 PM GMT+3, Jacek Grzebyta < > grzebyta....@gmail.com> wrote: > > > Hi, > > Apologies for repeating question from IRC room but I am not sure if that is > alive. > > I have no idea about how lucene works but I need to modify some part in > rdf4j project which depends on that. > > I need to use lucene to create a mapping file based on text searching and I > found there is a following problem. Let take a term 'abcd' which is mapped > to node 'abcd-2' whereas node 'abcd' exists. I found the issue is lucene is > searching the term and finds it in both nodes 'abcd' and 'abcd-2' and gives > the same score. My question is: how to modify the scoring to penalise the > fact the searched term is a part of longer word and give more score if that > is itself a word. > > Visually It looks like that: > > node 'abcd': > - name: abcd > > total score = LS /lucene score/ * 2.0 /name weight/ > > > > node 'abcd-2': > - name: abcd-2 > - alias1: abcd-h > - alias2: abcd-k9 > > total score = LS * 2.0 + LS * 0.5 /alias1 score/ + LS * 0.1 /alias2 score/ > > I gave different weights for properties. "Name" has the the highest weight > but "alias" has some small weight as well. In total the score for a node is > a sum of all partial score * weight. Finally 'abcd-2' has highest score > than 'abcd'. > > thanks, > Jacek >