Hi Sumit,

Here's a good place to start:

   http://lucene.apache.org/java/docs/scoring.html

Steve

On 12/28/2007 at 12:30 PM, sumittyagi wrote:
> 
> also
> what is the lucene ranking (scoring documents) formula
> 
> sumittyagi wrote:
> > 
> > hi which file can i edit to change the scoring factors in lucene results
> > 
> > markharw00d wrote:
> > > 
> > > Thanks for the context - much more useful. The challenge here is
> > > similar to that posed by offering end-user tagging of content (see here
> > > 
> http://www.mail-archive.com/java-user@lucene.apache.org/msg175
> 80.html ).
> > > The main difference here being that words are added to docs implicitly
> > > by search click-throughs rather than any explicit tagging action.
> > > 
> > > In both cases the challenge is that the user data around documents is
> > > likely to be updated very often while the documents remain relatively
> > > static. I suspect some additional things to think about are: 1)
> > > Cancelling out the "human laziness" bias that favours clicking results
> > > on page 1. Are clicks on page 2 worth more? 2) Spam clicks - detecting
> > > deliberate gaming of your re-ranking algorithm. 3) Lucene doc IDs are
> > > not stable - how will you associate query terms/click data with
> > > documents and join them at speed? 4) Are individual words or phrases
> > > the unit of boost? "Paris" means different things in "Paris Hilton"
> > > and "Paris, France".
> > > 
> > > A simple approach might be to re-index your content with all of the
> > > additional search terms from clicks added to the associated document
> > > in a "searchClicks" field - the more clicks, the more repetitions of
> > > the same search words in the document to help with tf (Term
> > > Frequency). This additional content would need to be capped, to avoid
> > > huge documents. This has the disadvantage of requiring a re-index
> > > though. Another option to avoid reindexing everything is to wrap
> > > IndexReader (See FilterIndexReader) and implement TermEnum/TermDocs
> > > for a fake field called "searchClicks". The idea is Lucene looks after
> > > the usual, static document content while your implementation goes off
> > > to your more volatile storage (e.g. database/parallel index, custom
> > > file structure) to retrieve lists of doc ids, term frequencies etc.
> > > for this "searchClicks" field. All of the Lucene queries you might
> > > want to throw at this e.g. PhraseQueries can then test both the static
> > > Lucene fields and your new volatile "click" fields without being aware
> > > of this low-level trickery.
> > > 
> > > I'm sure there will be other ways of doing this too but this seems
> > > like a conceptually clean way of modelling it - just seeing search
> > > terms as extensions to the document content.
> > > 
> > > Cheers
> > > Mark
> > > 
> > > 
> > > ----- Original Message ----
> > > From: sumittyagi <[EMAIL PROTECTED]>
> > > To: java-user@lucene.apache.org
> > > Sent: Sunday, 23 December, 2007 5:30:55 AM
> > > Subject: Re: Which file in the lucene package is used to manipulate
> > > results..
> > > 
> > > 
> > > Actually what i have to do is...
> > > 1.) for every query(keyword), among the results obtained,
> the keyword
> > >  will
> > > be mapped with the page clicked, along with the no. of clicks for that
> > > keyword on that page 2.) next time for the same query(keyword), the
> > > mapped pages will be
> > >  ranked
> > > higher considering the no. of clicks too..
> > > 3.) for every new query these steps will be repeated...
> > > this was a very high level view , i have made algorithms for these
> > >  modules
> > > and trying to incorporate with lucene but dont know , on
> which files i
> > >  have
> > > to do edition to make it work...
> > > please help me regarding this, if you need some more explanation,
> > >  please let
> > > me know...
> > > thanks
> > > Sumit Tyagi
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Erick Erickson wrote:
> > > > 
> > > > You still haven't explained *why* you want to rerank results. What is
> > > > the use-case you're trying to implement? Quite often it's turned out
> > > > for me that when I let folks on the list know what the use case I'm
> > > > trying to support is, they come up with much more elegant solutions
> > > > than I was thinking about.
> > > > 
> > > > For instance, does the CustomScoreQuery class have any relevance
> > > > to your problem?
> > > > 
> > > > If you're thinking of modifying the core Lucene code for your special
> > > > purpose, I'd advise against it unless and until you'd exhausted all
> > > > the other options. It's always a maintenance headache to do this.
> > > > 
> > > > Best
> > > > Erick
> > > > 
> > > > On Dec 21, 2007 10:09 AM, sumittyagi <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > 
> > > > > actually i am writing a module to rerank the results, so
> i want to
> > >  edit
> > > > > the file which arrange the results and give them ranks, or is there
> > > > > any other way i can use my module to rerank the results
> > > > > 
> > > > > 
> > > > > markharw00d wrote:
> > > > > > 
> > > > > > I think you need to describe your "factors" in more detail.
> > >  Exactly
> > > > > what
> > > > > > do you want to achieve for your users?
> > > > > > We could be talking about any number of Lucene functions here.
> > > > > > 
> > > > > > ----- Original Message ---- From: sumittyagi <[EMAIL PROTECTED]>
> > > > > > To: java-user@lucene.apache.org Sent: Friday, 21 December, 2007
> > > > > > 4:51:09 AM Subject: Which file in the lucene package is used to
> > > > > > manipulate results..
> > > > > > 
> > > > > > 
> > > > > > hi, i am using lucene for the very first time and want to
> > >  manipulate
> > > > > >  the
> > > > > > results, by adding some more factors to it, which file should i
> > > > > > edit to manipulate the search results....
> > > > > > 
> > > > > > Thanks
> > > > > > Sumit Tyagi
> > > > > > --
> > > > > > View this message in context:
> > > > > > 
> > > > > > 
> > > > > 
> > > 
> > > 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used
> -to-manipulate-results..-tp14450335p14450335.html
> > > > > > Sent from the Lucene - Java Users mailing list archive at
> > > > > > Nabble.com.
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > __________________________________________________________ Sent
> > > > > > from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> > > > > > 
> > > > > > 
> > > > > > 
> > > 
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > > > > > additional commands, e-mail: [EMAIL PROTECTED]
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > --
> > > > > View this message in context:
> > > > > 
> > > 
> > > 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used
> -to-manipulate-results..-tp14450335p14456938.html
> > > > > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > > > > 
> > > > > 
> > > > > 
> > > 
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > > > For additional commands, e-mail: [EMAIL PROTECTED]
> > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > --
> > > View this message in context:
> > > 
> > > 
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used
> -to-manipulate-results..-tp14450335p14476062.html
> > > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > > additional commands, e-mail: [EMAIL PROTECTED]
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >       __________________________________________________________
> > > Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> > > 
> > > 
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > > additional commands, e-mail: [EMAIL PROTECTED]
> > > 
> > > 
> > > 
> > 
> > 
> 
> -- View this message in context:
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used
> -to-manipulate-results..-tp14450335p14528677.html Sent from the Lucene -
> Java Users mailing list archive at Nabble.com.
> 
> 
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: [EMAIL PROTECTED] For
> additional commands, e-mail: [EMAIL PROTECTED]
> 
>

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to