Hi Lukai That was a great help. Thank you.
I’m continuing reading about payloads: http://searchhub.org/2009/08/05/getting-started-with-payloads/ Didn’t know that concept at all. Regards, Rune Den 13/02/2014 kl. 23.12 skrev lukai <lukai1...@gmail.com>: > Hi, Rune: > Per your requirement, you can generate a separated filed for the document > before send document to lucene. Let's say the name is: score_field. The > content of this field in this way: > Doc 1#score_field: > Lucence:0.7 is:0 ... > Doc 2#score_field: > Lucene:0.5 is:0 ... > > Store the field with "indexed", store other fields as "stored". And store > the weight value as payload for terms(wrap your ananlyzer to consume the > weight value, basically you can leverage: DelimitedPayloadTokenFilter and > WhitespaceTokenizer to form a basic analyzer which can take the input > format). Make sure the term in each document in score_field is unique > (according your description it's already fullfilled). You can also disable > to index the position information for this filed, cuz you dont need it. > > Then when you do query: > 1. If you want to do score like a cosine similarity based on query and > document, you should implement a query parser to parse weight you assigned > in different terms in query phrase. > 2. create a new query type and customize you score function and tell lucene > to use your scorer. > > Here is a small snippet of a query type i had created before, basically > you can follow this logic to manipulate your score value: > > final Terms terms = fields.terms(fieldName); > > if(terms != null ){ > > final TermsEnum termsEnum = terms.iterator(null); > > BytesRef bytes = new BytesRef(wandTerm.queryTerm); > > if(termsEnum.seekExact(new BytesRef(wandTerm.queryTerm))){ > > > > float ub = termsEnum.maxFeatureValue(); > > int docFreq = termsEnum.docFreq(); > > // logger.warn("term:"+wandTerm.queryTerm +" :" + ub); > > DocsAndPositionsEnum docsPositionEnum = > termsEnum.docsAndPositions(acceptDocs, null); > > > tts.add(newWandPosting(fieldName,bytes,docsPositionEnum,ub,wandTerm. > featureValue,(totalDocNum+1)*1.0f/docFreq )); > > } > > > > On Thu, Feb 13, 2014 at 10:49 AM, Rune Stilling <s...@rdfined.dk> wrote: > >> I'm not sure how I would do that, when Lucene is meant to use my custom >> weights when calculating document weights when executing a search query. >> >> Doc 1 >> Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) >> API(0.3) >> >> Doc 2 >> Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1) >> >> Query >> Lucene >> >> 0.7 and 0.5 are my custom weight and should be used to return Doc 1 with >> weight 0.7 and Doc 2 with weight 0.5 as an answer to my query. >> >> /Rune >> >> Den 13/02/2014 kl. 13.27 skrev Shai Erera <ser...@gmail.com>: >> >>> I often prefer to manage such weights outside the index. Usually managing >>> them inside the index leads to problems in the future when e.g the >> weights >>> change. If they are encoded in the index, it means re-indexing. Also, if >>> the weight changes then in some segments the weight will be different >> than >>> others. I think that if you manage the weights e.g. in a simple FST >> (which >>> is very compat), it will give you the best flexibility and it's very easy >>> to use. >>> >>> Shai >>> >>> >>> On Thu, Feb 13, 2014 at 1:36 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>>> You could stuff your custom weights into a payload, and index that, >>>> but this is per term per document per position, while it sounds like >>>> you just want one float for each term regardless of which >>>> documents/positions where that term occurred? >>>> >>>> Doing your own custom attribute would be a challenge: not only must >>>> you create & set this attribute during indexing, but you then must >>>> change the indexing process (custom chain, custom codec) to get the >>>> new attribute into the index, and then make a custom query that can >>>> pull this attribute at search time. >>>> >>>> What are these term weights? Are you sure you can't compute these >>>> weights at search time with a custom similarity using the stats that >>>> are already stored (docFreq, totalTermFreq, maxDoc, etc.)? >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Thu, Feb 13, 2014 at 2:40 AM, Rune Stilling <s...@rdfined.dk> wrote: >>>>> Hi list >>>>> >>>>> I'm trying to figure out how customizable scoring and weighting is in >>>> the Lucene API. I read about the API's but still can't figure out if the >>>> following is possible. >>>>> >>>>> I would like to do normal document text indexing, but I would like to >>>> control the weight added to tokens my self, also I would like to control >>>> the weighting of query tokens and the how things are added together. >>>>> >>>>> When indexing a word I would like attache my own weights to the word, >>>> and use these weights when querying for documents. F.ex. >>>>> >>>>> Doc 1 >>>>> Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) >>>> API(0.3) >>>>> >>>>> Doc 2 >>>>> Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1) >>>>> >>>>> The floats in parentheses are some I would like to add in the indexing >>>> process, not something coming from Lucene tdf/id ex. >>>>> >>>>> Wen querying I would like to repeat this and also create the weights >> for >>>> each term "myself" and control how the final doc score is calculated. >>>>> >>>>> I have read that it's possible to attach your own custom attributes to >>>> tokens. Is this the way to go? Ie. should I add my custom weight as >>>> attributes to tokens, and then access these attributes when calculating >>>> document score in the search process (described here >>>> >> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.htmlunder"adding >> a custom attribute")? >>>>> >>>>> The reason why I'm asking is that I can't find any examples of this >>>> being done anywhere. But I found someone stating "With Lucene, it is >>>> impossible to increase or decrease the weight of individual terms in a >>>> document". >>>>> >>>>> With regards >>>>> Rune >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org