Very interesting the link you suggest me Mr Grant Ingersoll. Let see if I understand how the ranking issue in lucene could be implemented: 1. First I must create my own query class extending the abstract Query class. The only method I must implement from this class is toString. Is right this ??? 2. I must implement inside my own query class the Weight interface But I really don't understand how this is going to let me change my ranking scoring. 3 I must implement my custom Scorer ??? I don't know how integrate this. There is a lot of little pieces of information but not concrete. Greetings
On Nov 7, 2007 1:48 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Term Vectors (specifically TermFreqVector) in Lucene are a storage > mechanism for convenience and applications to use. They are not an > integral part of the scoring in the way you may be thinking of them in > terms of the traditional Vector Space Model, thus there may be some > confusion from the different usages of that terminology. If you want > to see examples of how to implement scorers have a look at classes > like TermScorer, BoostingTermQuery, and any of the other classes that > extend Scorer. You might also find the file formats page (off of the > Lucene Java website under Documentation) helpful for understanding > what Lucene stores so that it can do scoring. > > There really isn't any tutorial on scoring, as it is not something > that many people have expressed an interest in or no one has made it a > high enough priority to write one. Having written a Scorer (or maybe > two, I forget) I can give advice on specific things, but I am not sure > I could write a tutorial that is general enough to be useful at this > point. > > One thought for associating a weight to a given term based on its > cooccurring terms is to use the new Payload mechanism whereby you can > store a byte array at each term which can then be used in scoring via > things like the BoostingTermQuery (or your own implementation.) If > that is of interest, you can search the archives for payloads (I also > think Michael Busch is presenting on Payloads, amongst other things, > at ApacheCon in Atlanta) and have a look at the BoostingTermQuery. > There certainly are other PayloadQueries that need to be implemented. > See the Lucene wiki for some background and details on Payloads as well. > > I don't know that it is a big mistake to try this in Lucene. The > community hasn't put a huge priority on making altering the innards of > scoring easier to deal with (if possible), but that doesn't mean we > are not open to suggestions and patches. You may find > https://issues.apache.org/jira/browse/LUCENE-965 > to be informative for both the implementation and the discussion of > things that need to happen to be accepted into Lucene. This JIRA > issue specifically attempts to provide Lucene with a new scoring > mechanism. > > You might also have a look at Lemur (http://www.lemurproject.org/) > which is much more academically focused. > > Cheers, > Grant > > > On Nov 7, 2007, at 12:49 PM, Ariel wrote: > > > Then if I want to use another scoring formula I must to implement my > > own Query/Weigh/Scorer ? For example instead of cousine distance > > leiderbage distance or .. another. I'm studying Query/Weigh/Scorer > > classes to find out how to do that but there is not much documentation > > about that. > > > > I have seen I could change similarity factors extending the simlarity > > class, but I have not seen any example about changing scoring formula > > and changing the weight by term in the term vector. Do you know any > > tutorial about this ? > > > > What I want to do changing frecuency in the terms vector is this: for > > example instead of take the tf term frecuency of the term and stored > > in the vector I want to consider the correlation of the term with the > > other terms of the documents and store that measure by term in the > > vector so later with my custom similarity formula calculate the > > ranking of a document against a query considering the correlation > > between terms. > > Dou you think is a big mistake try to do this with lucene ??? Is > > there any way ? > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > -------------------------- > Grant Ingersoll > http://lucene.grantingersoll.com > > Lucene Boot Camp Training: > ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://www.apachecon.com > > Lucene Helpful Hints: > http://wiki.apache.org/lucene-java/BasicsOfPerformance > http://wiki.apache.org/lucene-java/LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]