Den 13/02/2014 kl. 12.36 skrev Michael McCandless <luc...@mikemccandless.com>:

> You could stuff your custom weights into a payload, and index that,
> but this is per term per document per position, while it sounds like
> you just want one float for each term regardless of which
> documents/positions where that term occurred?

No I want to store a weight per term per document. The point is that my custom 
term weight is semantically dependent on the document context exactly the same 
way the other standard term weights are.

It doesn’t make sense to also have a separate weight per position.

> Doing your own custom attribute would be a challenge: not only must
> you create & set this attribute during indexing, but you then must
> change the indexing process (custom chain, custom codec) to get the
> new attribute into the index, and then make a custom query that can
> pull this attribute at search time.

Hmmm well - But will it solve my problem then?

> What are these term weights?  Are you sure you can't compute these
> weights at search time with a custom similarity using the stats that
> are already stored (docFreq, totalTermFreq, maxDoc, etc.)?

Yes I’m sure. I’m doing a semantic analysis of the documents before they are 
indexed, and it’s the result of this I want to store as a custom weight on a 
term per document basis. The docFreq, etc. are reflecting a quite simple 
approach to term weighting (i.e. - td/idf), which just isn’t precise enough in 
my case.

So it seems I might as well build my own term lists and code the indexing and 
searching process manually?

With regards,
Rune

> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Thu, Feb 13, 2014 at 2:40 AM, Rune Stilling <s...@rdfined.dk> wrote:
>> Hi list
>> 
>> I'm trying to figure out how customizable scoring and weighting is in the 
>> Lucene API. I read about the API's but still can't figure out if the 
>> following is possible.
>> 
>> I would like to do normal document text indexing, but I would like to 
>> control the weight added to tokens my self, also I would like to control the 
>> weighting of query tokens and the how things are added together.
>> 
>> When indexing a word I would like attache my own weights to the word, and 
>> use these weights when querying for documents. F.ex.
>> 
>> Doc 1
>> Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) 
>> API(0.3)
>> 
>> Doc 2
>> Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) people(0.1)
>> 
>> The floats in parentheses are some I would like to add in the indexing 
>> process, not something coming from Lucene tdf/id ex.
>> 
>> Wen querying I would like to repeat this and also create the weights for 
>> each term "myself" and control how the final doc score is calculated.
>> 
>> I have read that it's possible to attach your own custom attributes to 
>> tokens. Is this the way to go? Ie. should I add my custom weight as 
>> attributes to tokens, and then access these attributes when calculating 
>> document score in the search process (described here 
>> https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/analysis/package-summary.html
>>  under "adding a custom attribute")?
>> 
>> The reason why I'm asking is that I can't find any examples of this being 
>> done anywhere. But I found someone stating "With Lucene, it is impossible to 
>> increase or decrease the weight of individual terms in a document".
>> 
>> With regards
>> Rune
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to