Currently we use lucene 2.3.2, the reason why we recreate searcher each time is that within one server we managed a few thousand independent lucene index data folders. Those folders have different sizes, the large ones have about 200K docs (but growing).
Thanks very much for helps, Lisheng -----Original Message----- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, December 01, 2011 11:34 AM To: Zhang, Lisheng Cc: java-user@lucene.apache.org Subject: Re: Boost more recent document On Thu, Dec 1, 2011 at 8:30 PM, Zhang, Lisheng <lisheng.zh...@broadvision.com> wrote: > Hi Simon, > > 1) Thanks for suggesting lucene 4.0 feature, we will make use of it as soon as > we upgrade lucene. > > 2) Currently we recreate IndexSearcher for each query, which means recreate > underlying IndexReader for each query (I should have said IndexReader), but > sort performance is OK, so I would like to try CustomScoreQuery without > cache > first? WOW - why do you do this? Can't you use the SearcherManager in Lucene 3.5? simon > > Thanks very much for helps, Lisheng > > -----Original Message----- > From: Simon Willnauer [mailto:simon.willna...@googlemail.com] > Sent: Thursday, December 01, 2011 11:21 AM > To: Zhang, Lisheng > Cc: java-user@lucene.apache.org > Subject: Re: Boost more recent document > > > On Thu, Dec 1, 2011 at 7:36 AM, Zhang, Lisheng > <lisheng.zh...@broadvision.com> wrote: >> Hi Simon, >> >> Sorry I found that I cannot use payload for this purpose because payload >> can be accessed only through term positions but we did not use timestamp >> for query. Ideally it would be great if we can have some doc-level "payload" >> accessible through docId? > > lucene 4 has a feature called IndexDocValues which is essentially a > payload per document per field. > > you can read about it here: > http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values > http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues > http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications >> >> Then your initial suggestion to use CustomScoreQuery would be our solution, >> from source code I see sort is implemented by FieldCache and its performance >> seems OK even though we didnot cache reader. So we will use CustomeScoreQuery >> without cache for now (cutting time stamp to hour or day may help), if too >> slow we may consider selected cache. > > what do you mean by cache readers? > > simon >> >> Thanks very much for all your great helps, please point out if you see wrong >> in above statements? >> >> Best regards, Lisheng >> >> -----Original Message----- >> From: Zhang, Lisheng [mailto:lisheng.zh...@broadvision.com] >> Sent: Wednesday, November 30, 2011 1:40 PM >> To: java-user@lucene.apache.org; simon.willna...@gmail.com >> Subject: RE: Boost more recent document >> >> >> Hi, >> >> Thanks for the very interesting idea! >> >> Currently we use lucene 2.3.2 and we just use default merge policy (at >> any time we have a few segments and after some accumulation small segments >> are merged into big ones). I need to double check if docId can reflect doc >> age. >> >> But I have one concern: docId may not reflect true age interval, like docId >> difference by 2 may reflect 2m or 1h. If no better choice I may just use >> payload and adapt a few query classes? >> >> Thanks very much for helps, Lisheng >> >> -----Original Message----- >> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >> Sent: Wednesday, November 30, 2011 1:02 PM >> To: java-user@lucene.apache.org >> Subject: Re: Boost more recent document >> >> >> If you use LogMergePolicy ie. do merges in order you could use the >> absolute docID as a relative age value. Smaller docIDs mean younger >> documents. Maybe this works for you? >> >> simon >> >> On Wed, Nov 30, 2011 at 9:08 PM, Zhang, Lisheng >> <lisheng.zh...@broadvision.com> wrote: >>> Thanks very much for your helps! I got the point, only problem is that >>> I cannot afford to to use FieldCache because in our app we have many >>> lucene index data folders, is there another simple way? >>> >>> Thanks again, Lisheng >>> >>> -----Original Message----- >>> From: Simon Willnauer [mailto:simon.willna...@googlemail.com] >>> Sent: Wednesday, November 30, 2011 11:40 AM >>> To: java-user@lucene.apache.org >>> Subject: Re: Boost more recent document >>> >>> >>> On Wed, Nov 30, 2011 at 6:59 PM, Zhang, Lisheng >>> <lisheng.zh...@broadvision.com> wrote: >>>> Hi, >>>> >>>> We need to boost document which is more recent (each doc has time stamp >>>> attribute). It seems that >>>> we cannot use doc boost at index time because it will be condensed into >>>> one byte (cannot differentiate >>>> 365 days), so we may use payload (save time stamp as payload) to boost at >>>> search time. >>>> >>>> In our app we let user enter query at browser and use QueryParser to >>>> generate query, the query can >>>> be different types (TermQuery, BooleanQuery, WildcardQuery, ...), then it >>>> seems we need to create >>>> each customized query class similar to PayloadTermQuery, is there another >>>> simpler way? >>> >>> you can simply index your timestamp (untokenzied) and wrap your query >>> in a CustomScoreQuery. This query accepts your user query and a >>> ValueSource. During search CustomScoreQuery calls your valuesource for >>> each document that the user query scores and multiplies the result of >>> the ValueSource into the score. Inside your valuesource you can simply >>> get the timestamps from the FieldCache and calculate your custom >>> boost... >>> >>> hope that helps >>> >>> simon >>>> >>>> Thanks very much for helps, Lisheng >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >>