You should consider keeping the PageRank (and any other more dynamic data) in a separate index (with the documents in the same oder as your bigger, more static index) and then use a ParallelReader on both of them. See: http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html
-Glen 2008/5/28 过佳 <[EMAIL PROTECTED]>: > I think this is not suitable for my system since the num of pages is very > large that will cost much time for reindex > > 2008/5/28, Ian Lea <[EMAIL PROTECTED]>: >> >> Yes. But you'd have to do that anyway if you are storing pagerank in the >> index. >> >> One point on your 20s response time for sorting - is that for the >> first sort or subsequent ones? >> I believe that the first one will usually be substantially slower. >> But sorting is always likely to be slower than not sorting. >> >> >> -- >> Ian. >> >> >> On Wed, May 28, 2008 at 12:47 PM, 过佳 <[EMAIL PROTECTED]> wrote: >> > thanks lan, but this means that i must reindex these pages while the >> > pagerank score changed? >> > >> > 在08-5-28,Ian Lea <[EMAIL PROTECTED]> 写道: >> >> >> >> Hi >> >> >> >> >> >> Maybe you could use the pagerank score, possibly modified, as document >> >> boost at indexing time. From the javadocs for >> >> Document.setBoost(boost) >> >> >> >> "Sets a boost factor for hits on any field of this document. This >> >> value will be multiplied into the score of all hits on this document" >> >> >> >> so will give you P * R rather than P + R. Should be quick, though. >> >> >> >> >> >> -- >> >> Ian. >> >> >> >> >> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <[EMAIL PROTECTED]> wrote: >> >> > hi all , >> >> > I have a problem that how to "combine" two score to sort the >> search >> >> > result documents. >> >> > for example I have 10 million pages in lucene index , and i know >> >> their >> >> > pagerank scores. i give a query to it , every docs returned have a >> >> > lucene-score, mark it as R (relevant score), and i also have its >> >> > pagerank score, mark it as P, what i need is i want to sort the >> search >> >> > result base on the value "P+R". You know if i store the pagerank >> score >> >> in >> >> > index and get it every search time , then compute P+R , then sort it , >> >> this >> >> > way is too slow. in my system , when the search hits 500000 result , >> the >> >> > sort may cost about 20s. >> >> > Sorry for my poor english. Anyone has a good idea? >> >> > >> >> > Best >> >> > Jarvis >> >> > >> >> >> > >> > -- -