Re: Which file in the lucene package is used to manipulate results..

sumittyagi Wed, 20 Feb 2008 11:17:54 -0800

hi Mark
Actually i am using object oriented database. Where can i find the
information regarding embedding lucene with database.
Thanks



mark harwood wrote:
> 
> Hi Sumit,
>>>now i want my database to communicate with lucene api 
> 
> I would recommend that it's the other way round....see my earlier comment
> on using FilterIndexReader and creating "faked" TermEnum and TermDocs to
> make your database content appear as if it were part of the index when
> calling Lucene. If you do want to make the database call Lucene see the
> recent work on embedding Lucene in Oracle.
> There is no simple ready-made solution here that I can post in a few lines
> of code - you'll need to familiarise yourself with these low-level APIs
> that underpin Lucene searches (they are all documented).
> 
> 
> Cheers
> Mark
> 
> ----- Original Message ----
> From: sumittyagi <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, 20 February, 2008 4:43:56 PM
> Subject: Re: Which file in the lucene package is used to manipulate
> results..
> 
> 
> Hi Mark Harwood
> I know it's being a long time, but till now i was busy in developing the
> database to store the keyword, document and no. of clicks of the document
> for the keyword and their respective mappings.
> now i want my database to communicate with lucene api and i cannot figure
> it
> out where to start from.
> Please help me out, how can i make my database to work with lucene.
> Thanks
> Sumit
> 
> mark harwood wrote:
>> 
>> Thanks for the context - much more useful.
>> The challenge here is similar to that posed by offering end-user tagging
>> of content (see here
>> http://www.mail-archive.com/[email protected]/msg17580.html ).
>> The main difference here being that words are added to docs implicitly by
>> search click-throughs rather than any explicit tagging action.
>> 
>> In both cases the challenge is that the user data around documents is
>> likely to be updated very often while the documents remain relatively
>> static. 
>> I suspect some additional things to think about are:
>> 1) Cancelling out the "human laziness" bias that favours clicking results
>> on page 1. Are clicks on page 2 worth more?
>> 2) Spam clicks - detecting deliberate gaming of your re-ranking
>> algorithm.
>> 3) Lucene doc IDs are not stable - how will you associate query
>> terms/click data with documents and join them at speed?
>> 4) Are individual words or phrases the unit of boost? "Paris" means
>> different things in "Paris Hilton" and "Paris, France".
>> 
>> A simple approach might be to re-index your content with all of the
>> additional search terms from clicks added to the associated document in a
>> "searchClicks" field - the more clicks, the more repetitions of the same
>> search words in the document to help with tf (Term Frequency). This
>> additional content would need to be capped, to avoid huge documents. This
>> has the disadvantage of requiring a re-index though. 
>> Another option to avoid reindexing everything is to wrap IndexReader (See
>> FilterIndexReader) and implement TermEnum/TermDocs for a fake field
>> called
>> "searchClicks". The idea is Lucene looks after the usual, static document
>> content while your implementation goes off to your more volatile storage
>> (e.g. database/parallel index, custom file structure) to retrieve lists
>> of
>> doc ids, term frequencies etc. for this "searchClicks" field. All of the
>> Lucene queries you might want to throw at this e.g. PhraseQueries can
>> then
>> test both the static Lucene fields and your new volatile "click" fields
>> without being aware of this low-level trickery. 
>> 
>> I'm sure there will be other ways of doing this too but this seems like a
>> conceptually clean way of modelling it - just seeing search terms as
>> extensions to the document content.
>> 
>> Cheers
>> Mark
>> 
>> 
>> ----- Original Message ----
>> From: sumittyagi <[EMAIL PROTECTED]>
>> To: [email protected]
>> Sent: Sunday, 23 December, 2007 5:30:55 AM
>> Subject: Re: Which file in the lucene package is used to manipulate
>> results..
>> 
>> 
>> Actually what i have to do is...
>> 1.) for every query(keyword), among the results obtained, the keyword
>>  will
>> be mapped with the page clicked, along with the no. of clicks for that
>> keyword on that page
>> 2.) next time for the same query(keyword), the mapped pages will be
>>  ranked
>> higher considering the no. of clicks too..
>> 3.) for every new query these steps will be repeated...
>> this was a very high level view , i have made algorithms for these
>>  modules
>> and trying to incorporate with lucene but dont know , on which files i
>>  have
>> to do edition to make it work...
>> please help me regarding this, if you need some more explanation,
>>  please let
>> me know...
>> thanks 
>> Sumit Tyagi
>> 
>> 
>> 
>> 
>> 
>> Erick Erickson wrote:
>>> 
>>> You still haven't explained *why* you want to rerank results. What
>>> is the use-case you're trying to implement? Quite often it's turned
>>> out for me that when I let folks on the list know what the use
>>> case I'm trying to support is, they come up with much more elegant
>>> solutions than I was thinking about.
>>> 
>>> For instance, does the CustomScoreQuery class have any relevance
>>> to your problem?
>>> 
>>> If you're thinking of modifying the core Lucene code for your
>>> special purpose, I'd advise against it unless and until you'd
>>  exhausted
>>> all the other options. It's always a maintenance headache to do this.
>>> 
>>> Best
>>> Erick
>>> 
>>> On Dec 21, 2007 10:09 AM, sumittyagi <[EMAIL PROTECTED]> wrote:
>>> 
>>>>
>>>> actually i am writing a module to rerank the results, so i want to
>>  edit
>>>> the
>>>> file which arrange the results and give them ranks,
>>>> or is there any other way i can use my module to rerank the results
>>>>
>>>>
>>>> markharw00d wrote:
>>>> >
>>>> > I think you need to describe your "factors" in more detail.
>>  Exactly
>>>> what
>>>> > do you want to achieve for your users?
>>>> > We could be talking about any number of Lucene functions here.
>>>> >
>>>> > ----- Original Message ----
>>>> > From: sumittyagi <[EMAIL PROTECTED]>
>>>> > To: [email protected]
>>>> > Sent: Friday, 21 December, 2007 4:51:09 AM
>>>> > Subject: Which file in the lucene package is used to manipulate
>>>> results..
>>>> >
>>>> >
>>>> > hi, i am using lucene for the very first time and want to
>>  manipulate
>>>> >  the
>>>> > results, by adding some more factors to it, which file should i
>>  edit to
>>>> > manipulate the search results....
>>>> >
>>>> > Thanks
>>>> > Sumit Tyagi
>>>> > --
>>>> > View this message in context:
>>>> >
>>>> >
>>>>
>> 
>> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14450335.html
>>>> > Sent from the Lucene - Java Users mailing list archive at
>>  Nabble.com.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >       __________________________________________________________
>>>> > Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>>>> >
>>>> >
>>>> >
>>  ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> > For additional commands, e-mail: [EMAIL PROTECTED]
>>>> >
>>>> >
>>>> >
>>>>
>>>> --
>>>> View this message in context:
>>>>
>> 
>> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14456938.html
>>>> Sent from the Lucene - Java Users mailing list archive at
>>  Nabble.com.
>>>>
>>>>
>>>>
>>  ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>>>> For additional commands, e-mail: [EMAIL PROTECTED]
>>>>
>>>>
>>> 
>>> 
>> 
>> -- 
>> View this message in context:
>> 
>> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p14476062.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>> 
>> 
>> 
>> 
>> 
>> 
>>       __________________________________________________________
>> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15591566.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> 
>       __________________________________________________________
> Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Which-file-in-the-lucene-package-is-used-to-manipulate-results..-tp14450335p15596170.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Which file in the lucene package is used to manipulate results..

Reply via email to