Hi, Apologies for repeating question from IRC room but I am not sure if that is alive.
I have no idea about how lucene works but I need to modify some part in rdf4j project which depends on that. I need to use lucene to create a mapping file based on text searching and I found there is a following problem. Let take a term 'abcd' which is mapped to node 'abcd-2' whereas node 'abcd' exists. I found the issue is lucene is searching the term and finds it in both nodes 'abcd' and 'abcd-2' and gives the same score. My question is: how to modify the scoring to penalise the fact the searched term is a part of longer word and give more score if that is itself a word. Visually It looks like that: node 'abcd': - name: abcd total score = LS /lucene score/ * 2.0 /name weight/ node 'abcd-2': - name: abcd-2 - alias1: abcd-h - alias2: abcd-k9 total score = LS * 2.0 + LS * 0.5 /alias1 score/ + LS * 0.1 /alias2 score/ I gave different weights for properties. "Name" has the the highest weight but "alias" has some small weight as well. In total the score for a node is a sum of all partial score * weight. Finally 'abcd-2' has highest score than 'abcd'. thanks, Jacek