Hello, little off topic, but how did you obtain the pagerank for each page. did you calculate it, or have you obtained it with some other way while getting a specific site.
Best. On Thu, May 29, 2008 at 3:28 PM, 过佳 <[EMAIL PROTECTED]> wrote: > thanks Glen , we have tried it , but the bottleneck is to get the document > (indexReader.document(num)), so it is not efficient enough . > > 2008/5/28, Glen Newton <[EMAIL PROTECTED]>: > > > > You should consider keeping the PageRank (and any other more dynamic > > data) in a separate index (with the documents in the same oder as your > > bigger, more static index) and then use a ParallelReader on both of > > them. See: > > > > > http://lucene.apache.org/java/2_1_0/api/org/apache/lucene/index/ParallelReader.html > > > > -Glen > > > > 2008/5/28 过佳 <[EMAIL PROTECTED]>: > > > I think this is not suitable for my system since the num of pages is > very > > > large that will cost much time for reindex > > > > > > 2008/5/28, Ian Lea <[EMAIL PROTECTED]>: > > >> > > >> Yes. But you'd have to do that anyway if you are storing pagerank in > > the > > >> index. > > >> > > >> One point on your 20s response time for sorting - is that for the > > >> first sort or subsequent ones? > > >> I believe that the first one will usually be substantially slower. > > >> But sorting is always likely to be slower than not sorting. > > >> > > >> > > >> -- > > >> Ian. > > >> > > >> > > >> On Wed, May 28, 2008 at 12:47 PM, 过佳 <[EMAIL PROTECTED]> wrote: > > >> > thanks lan, but this means that i must reindex these pages while the > > >> > pagerank score changed? > > >> > > > >> > 在08-5-28,Ian Lea <[EMAIL PROTECTED]> 写道: > > >> >> > > >> >> Hi > > >> >> > > >> >> > > >> >> Maybe you could use the pagerank score, possibly modified, as > > document > > >> >> boost at indexing time. From the javadocs for > > >> >> Document.setBoost(boost) > > >> >> > > >> >> "Sets a boost factor for hits on any field of this document. This > > >> >> value will be multiplied into the score of all hits on this > document" > > >> >> > > >> >> so will give you P * R rather than P + R. Should be quick, though. > > >> >> > > >> >> > > >> >> -- > > >> >> Ian. > > >> >> > > >> >> > > >> >> On Wed, May 28, 2008 at 11:02 AM, 过佳 <[EMAIL PROTECTED]> wrote: > > >> >> > hi all , > > >> >> > I have a problem that how to "combine" two score to sort the > > >> search > > >> >> > result documents. > > >> >> > for example I have 10 million pages in lucene index , and i > > know > > >> >> their > > >> >> > pagerank scores. i give a query to it , every docs returned have > a > > >> >> > lucene-score, mark it as R (relevant score), and i also have > its > > >> >> > pagerank score, mark it as P, what i need is i want to sort the > > >> search > > >> >> > result base on the value "P+R". You know if i store the pagerank > > >> score > > >> >> in > > >> >> > index and get it every search time , then compute P+R , then sort > > it , > > >> >> this > > >> >> > way is too slow. in my system , when the search hits 500000 > result > > , > > >> the > > >> >> > sort may cost about 20s. > > >> >> > Sorry for my poor english. Anyone has a good idea? > > >> >> > > > >> >> > Best > > >> >> > Jarvis > > >> >> > > > >> >> > > >> > > > >> > > > > > > > > > > > -- > > > > - > > >