Re: How to add PageRank score with lucene's relevant score in sorting

过佳 Thu, 29 May 2008 05:29:23 -0700

thanks Glen , we have tried it , but the bottleneck is to get the document
(indexReader.document(num)), so it is not efficient enough .


2008/5/28, Glen Newton <[EMAIL PROTECTED]>:
>
> You should consider keeping the PageRank (and any other more dynamic
> data) in a separate index (with the documents in the same oder as your
> bigger, more static index) and then use a ParallelReader on both of
> them. See:
>
> http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html
>
> -Glen
>
> 2008/5/28 过佳 <[EMAIL PROTECTED]>:
> > I think this is not suitable for my system since the num of pages is very
> > large that will cost much time for reindex
> >
> > 2008/5/28, Ian Lea <[EMAIL PROTECTED]>:
> >>
> >> Yes.  But you'd have to do that anyway if you are storing pagerank in
> the
> >> index.
> >>
> >> One point on your 20s response time for sorting - is that for the
> >> first sort or subsequent ones?
> >> I believe that the first one will usually be substantially slower.
> >> But sorting is always likely to be slower than not sorting.
> >>
> >>
> >> --
> >> Ian.
> >>
> >>
> >> On Wed, May 28, 2008 at 12:47 PM, 过佳 <[EMAIL PROTECTED]> wrote:
> >> > thanks lan, but this means that i must reindex these pages while the
> >> > pagerank score changed?
> >> >
> >> > 在08-5-28，Ian Lea <[EMAIL PROTECTED]> 写道：
> >> >>
> >> >> Hi
> >> >>
> >> >>
> >> >> Maybe you could use the pagerank score, possibly modified, as
> document
> >> >> boost at indexing time.  From the javadocs for
> >> >> Document.setBoost(boost)
> >> >>
> >> >> "Sets a boost factor for hits on any field of this document. This
> >> >> value will be multiplied into the score of all hits on this document"
> >> >>
> >> >> so will give you P * R rather than P + R.  Should be quick, though.
> >> >>
> >> >>
> >> >> --
> >> >> Ian.
> >> >>
> >> >>
> >> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <[EMAIL PROTECTED]> wrote:
> >> >> > hi all ,
> >> >> >     I have a problem that how to "combine" two score to sort the
> >> search
> >> >> > result documents.
> >> >> >     for example I  have 10 million pages in lucene index , and i
> know
> >> >> their
> >> >> > pagerank scores. i give a query to it , every docs returned have a
> >> >> > lucene-score, mark it as R (relevant score), and  i  also  have its
> >> >> > pagerank score, mark it as P,  what i need is i want to sort the
> >> search
> >> >> > result base on the value "P+R".  You know if i store the pagerank
> >> score
> >> >> in
> >> >> > index and get it every search time , then compute P+R , then sort
> it ,
> >> >> this
> >> >> > way is too slow. in my system , when the search hits 500000 result
> ,
> >> the
> >> >> > sort may cost about 20s.
> >> >> >   Sorry for my poor english.  Anyone has a good idea?
> >> >> >
> >> >> > Best
> >> >> > Jarvis
> >> >> >
> >> >>
> >> >
> >>
> >
>
>
>
> --
>
> -
>

Re: How to add PageRank score with lucene's relevant score in sorting

Reply via email to