Never mind of how to open the ParallellReader stuff (I am an idiot): RTFM: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/ParallelReader.html
But the rest is of course interesting :) /M On Thu, Apr 23, 2009 at 11:42 PM, Marcus Herou <marcus.he...@tailsweep.com>wrote: > Thanks! (I started my reply and then saw that you added code snippets) > > I think we are narrowing down the problem to the updating issue of the > PageRank score. > > So what you basically are saying is that: > > 1. You have an index which contains data that is more or less static (no > updates) or you have another update interval than the PR interval. > 2. A PR index which is rebuilt (from scratch ?) every X days/weeks/months. > > Commenting inline. > > Cheers and thanks for everything. > > //Marcus > > On Thu, Apr 23, 2009 at 11:28 PM, Steven Bethard <beth...@stanford.edu>wrote: > >> On 4/23/2009 2:08 PM, Marcus Herou wrote: >> > But perhaps one could use a FieldCache somehow ? >> >> Some code snippets that may help. I add the PageRank value as a field of >> the documents I index with Lucene like this: >> >> Document document = new Document(); >> double pageRank = this.pageRanks.getCount(article.getId()); >> document.add(new Field( >> PAGE_RANK_FIELD_NAME, Float.toString((float)pageRank), >> Field.Store.YES, Field.Index.NOT_ANALYZED)); > > > Just as I do but I store the score/PageRank in the "main" index which is a > pain in the ass since I want to be able to update the score field quite > frequently at least every week. > >> >> >> Then when I want to get the value back, I use a FieldScoreQuery, which >> just returns the field value as the document score, like this: >> >> new FieldScoreQuery(PAGE_RANK_FIELD_NAME, FieldScoreQuery.Type.FLOAT); > > > Great. Never seen that query type before. Will it use that fields score > solely as the base for rank ? Can it be combinable with other Query types or > am I missing something ? > What if I instead sorted on that column ? which is most efficient ? > Example: searcher.search(q, null, 100000, new Sort("score", true)); > Then we get to the issue of how to open two indexes and use one > IndexSearcher across them and perhaps a MultiFieldQueryParser to produce the > Query. > Pointers ? > > >> >> >> If you want to combine the PageRank score with another Query score, then >> you can look at CustomScoreQuery to do so. > > You mean by perhaps combining it with user supplied boost like field:term^4 > ? > Will the score passed in be the score of the FieldScoreQuery if used or the > built in one ? > > >> >> >> Steve >> >> > On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou >> > <marcus.he...@tailsweep.com>wrote: >> > >> >> Yes I have considered it for 30 minutes :) >> >> >> >> How do one apply that in the real world ? >> >> >> >> If the only thing I get access to is the actual docId would it not be >> >> really expensive to get the Document itself from the index and later >> use >> >> some field in it as external lookup in some optimized structure for >> this ? >> >> >> >> Example, pseudo: >> >> >> >> *public* *float* customScore(*int* doc, *float* subQueryScore, *float* >> valSrcScore) >> >> >> >> { >> >> *Document document = indexSearcher.doc(doc); >> >> float score = >> MyOptimalHashStructure.getScore(document.get("someId")); >> >> return score**subQueryScore*;* >> >> >> >> } >> >> >> >> This would not scale well right ? I mean gathering scores through 100M >> docs >> >> would take some time I guess ? Or even 1M docs... >> >> >> >> Please push me in the right direction. >> >> >> >> Cheers >> >> >> >> //Marcus >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Apr 23, 2009 at 10:58 PM, Doron Cohen <cdor...@gmail.com> >> wrote: >> >> >> >>>> I think we are doing similar things, at least I am trying to >> implement >> >>>> document boosting with pagerank. Having issues of howto appky the >> >>> scoring >> >>>> of >> >>>> specific docs without actually reindex them. I feel something should >> be >> >>>> done >> >>>> at query time which looks at external data but do not know howto >> >>> implement >> >>>> that. Do you ? >> >>>> >> >>> Have you considered CustomScoreQuery in o.a.l.search.function ? It >> should >> >>> allow >> >>> incorporating external scores. >> >>> >> >>> Doron >> >>> >> >> >> >> >> >> -- >> >> Marcus Herou CTO and co-founder Tailsweep AB >> >> +46702561312 >> >> marcus.he...@tailsweep.com >> >> http://www.tailsweep.com/ >> >> http://blogg.tailsweep.com/ >> >> >> > >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > marcus.he...@tailsweep.com > http://www.tailsweep.com/ > http://blogg.tailsweep.com/ > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/