You should consider keeping the PageRank (and any other more dynamic
data) in a separate index (with the documents in the same oder as your
bigger, more static index) and then use a ParallelReader on both of
them. See:
  
http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html

-Glen

2008/5/28 过佳 <[EMAIL PROTECTED]>:
> I think this is not suitable for my system since the num of pages is very
> large that will cost much time for reindex
>
> 2008/5/28, Ian Lea <[EMAIL PROTECTED]>:
>>
>> Yes.  But you'd have to do that anyway if you are storing pagerank in the
>> index.
>>
>> One point on your 20s response time for sorting - is that for the
>> first sort or subsequent ones?
>> I believe that the first one will usually be substantially slower.
>> But sorting is always likely to be slower than not sorting.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, May 28, 2008 at 12:47 PM, 过佳 <[EMAIL PROTECTED]> wrote:
>> > thanks lan, but this means that i must reindex these pages while the
>> > pagerank score changed?
>> >
>> > 在08-5-28,Ian Lea <[EMAIL PROTECTED]> 写道:
>> >>
>> >> Hi
>> >>
>> >>
>> >> Maybe you could use the pagerank score, possibly modified, as document
>> >> boost at indexing time.  From the javadocs for
>> >> Document.setBoost(boost)
>> >>
>> >> "Sets a boost factor for hits on any field of this document. This
>> >> value will be multiplied into the score of all hits on this document"
>> >>
>> >> so will give you P * R rather than P + R.  Should be quick, though.
>> >>
>> >>
>> >> --
>> >> Ian.
>> >>
>> >>
>> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <[EMAIL PROTECTED]> wrote:
>> >> > hi all ,
>> >> >     I have a problem that how to "combine" two score to sort the
>> search
>> >> > result documents.
>> >> >     for example I  have 10 million pages in lucene index , and i know
>> >> their
>> >> > pagerank scores. i give a query to it , every docs returned have a
>> >> > lucene-score, mark it as R (relevant score), and  i  also  have its
>> >> > pagerank score, mark it as P,  what i need is i want to sort the
>> search
>> >> > result base on the value "P+R".  You know if i store the pagerank
>> score
>> >> in
>> >> > index and get it every search time , then compute P+R , then sort it ,
>> >> this
>> >> > way is too slow. in my system , when the search hits 500000 result ,
>> the
>> >> > sort may cost about 20s.
>> >> >   Sorry for my poor english.  Anyone has a good idea?
>> >> >
>> >> > Best
>> >> > Jarvis
>> >> >
>> >>
>> >
>>
>



-- 

-

Reply via email to