Thanks! (I started my reply and then saw that you added code snippets) I think we are narrowing down the problem to the updating issue of the PageRank score.
So what you basically are saying is that: 1. You have an index which contains data that is more or less static (no updates) or you have another update interval than the PR interval. 2. A PR index which is rebuilt (from scratch ?) every X days/weeks/months. Commenting inline. Cheers and thanks for everything. //Marcus On Thu, Apr 23, 2009 at 11:28 PM, Steven Bethard <beth...@stanford.edu>wrote: > On 4/23/2009 2:08 PM, Marcus Herou wrote: > > But perhaps one could use a FieldCache somehow ? > > Some code snippets that may help. I add the PageRank value as a field of > the documents I index with Lucene like this: > > Document document = new Document(); > double pageRank = this.pageRanks.getCount(article.getId()); > document.add(new Field( > PAGE_RANK_FIELD_NAME, Float.toString((float)pageRank), > Field.Store.YES, Field.Index.NOT_ANALYZED)); Just as I do but I store the score/PageRank in the "main" index which is a pain in the ass since I want to be able to update the score field quite frequently at least every week. > > > Then when I want to get the value back, I use a FieldScoreQuery, which > just returns the field value as the document score, like this: > > new FieldScoreQuery(PAGE_RANK_FIELD_NAME, FieldScoreQuery.Type.FLOAT); Great. Never seen that query type before. Will it use that fields score solely as the base for rank ? Can it be combinable with other Query types or am I missing something ? What if I instead sorted on that column ? which is most efficient ? Example: searcher.search(q, null, 100000, new Sort("score", true)); Then we get to the issue of how to open two indexes and use one IndexSearcher across them and perhaps a MultiFieldQueryParser to produce the Query. Pointers ? > > > If you want to combine the PageRank score with another Query score, then > you can look at CustomScoreQuery to do so. You mean by perhaps combining it with user supplied boost like field:term^4 ? Will the score passed in be the score of the FieldScoreQuery if used or the built in one ? > > > Steve > > > On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou > > <marcus.he...@tailsweep.com>wrote: > > > >> Yes I have considered it for 30 minutes :) > >> > >> How do one apply that in the real world ? > >> > >> If the only thing I get access to is the actual docId would it not be > >> really expensive to get the Document itself from the index and later use > >> some field in it as external lookup in some optimized structure for this > ? > >> > >> Example, pseudo: > >> > >> *public* *float* customScore(*int* doc, *float* subQueryScore, *float* > valSrcScore) > >> > >> { > >> *Document document = indexSearcher.doc(doc); > >> float score = > MyOptimalHashStructure.getScore(document.get("someId")); > >> return score**subQueryScore*;* > >> > >> } > >> > >> This would not scale well right ? I mean gathering scores through 100M > docs > >> would take some time I guess ? Or even 1M docs... > >> > >> Please push me in the right direction. > >> > >> Cheers > >> > >> //Marcus > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Apr 23, 2009 at 10:58 PM, Doron Cohen <cdor...@gmail.com> > wrote: > >> > >>>> I think we are doing similar things, at least I am trying to implement > >>>> document boosting with pagerank. Having issues of howto appky the > >>> scoring > >>>> of > >>>> specific docs without actually reindex them. I feel something should > be > >>>> done > >>>> at query time which looks at external data but do not know howto > >>> implement > >>>> that. Do you ? > >>>> > >>> Have you considered CustomScoreQuery in o.a.l.search.function ? It > should > >>> allow > >>> incorporating external scores. > >>> > >>> Doron > >>> > >> > >> > >> -- > >> Marcus Herou CTO and co-founder Tailsweep AB > >> +46702561312 > >> marcus.he...@tailsweep.com > >> http://www.tailsweep.com/ > >> http://blogg.tailsweep.com/ > >> > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/