In a web application, I have generally cached IndexSearcher in application scope and reused it for all requests.

You will have to balance the demand for timeliness of updates with the time it takes to build up the sort caches. You can't really have instantaneous viewing of newly added documents and fully optimized sorting (or any other operation that relies on building up caches from an IndexReader/IndexSearcher). Many folks have implemented IndexSearcher warming in the background of their applications, something which is a dramatic feature in Solr. So you may want to have a look at how Solr does its magic, or simply use Solr flat out :)

        Erik


On Mar 20, 2007, at 4:31 PM, David Seltzer wrote:

Erik,

I'm not using a cached IndexSearcher. Is this an option in an
environment where the underlying index changes on a second-by-second
basis? At what layer would a cached IndexSearcher be cached? At the
tomcat layer?

Caching at the object layer seems like it might help, but it doesn't
address my underlying concern. IE: the relative performance difference
between natural order and sorting order. Maybe you're right - and I
shouldn't be worried about the very first search against the index.

How would a cached searcher implementation look?

-Dave

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, March 20, 2007 4:03 PM
To: java-user@lucene.apache.org
Subject: Re: Sort Performance Question

Are you using a cached IndexSearcher such that successive sorts on
the same field will be more efficient?

        Erik


On Mar 20, 2007, at 3:39 PM, David Seltzer wrote:

Hi All,



I have a sort performance question:



I have a fairly large index consisting of chunks of full-text
transcriptions of television, radio and other media, and I'm trying to
make it searchable and sortable by date.  The search front-end uses a
parallelmultisearcher to search up to three indexes at a time (each
index contains a month of live data). When I search for the word
"toast"
(for example) sorted by score the results come back in about 1200ms,
when I sort it by DateTime the results come back in 3900ms.



Initially I was sorting based on a unixtime field, but having read
up on
it, I switched to a slightly easier format: "yyyyMMDDHHmm". Now this
value is still larger than an int, so I went one step farther and
created two more fields for test purposes: SortDate, which is yyyyMMdd
and SortTime which is HHmm. When I sort by SortDate then SortTime the
results come in even slower, around 4300ms.



To summarize:



//The sorting fields looks like this:

new Field("SortDateTime", sdfDateTime.format(dMySortDateTime),
Field.Store.YES, Field.Index.UN_TOKENIZED);

new Field("SortDate", sdfDate.format(dMySortDateTime),
Field.Store.YES,
Field.Index.UN_TOKENIZED);

new Field("SortTime", sdfTime.format(dMySortDateTime),
Field.Store.YES,
Field.Index.UN_TOKENIZED);



//and the performance looks like this:



//sort by score

Sort sSortOrder = Sort.RELEVANCE; //1200ms



//sort by datetime

Sort sSortOrder = new Sort("SortDateTime", true); //3900ms



//sort by date then time

//yes, I know this isn't valid code

Sort sSortOrder = new Sort({new
SortField("SortDate",SortField.INT,bReverse), new
SortField("SortTime",SortField.INT,bReverse)}); //4300ms





The two indexes that are being searched at the moment look like this:



Index 1:

Index Path: /storage/unisearch/MMS_index/2007.02/

Index Size on Disk: 1,400,569 KB

Number of Records: 2682238

Index Version: 03/13/2007



Index 2:

Index Path: /storage/unisearch/MMS_index/2007.03/

Index Size on Disk: 2,055,199 KB

Number of Records: 3457434

Index Version: 03/13/2007



The search is being performed in tomcat and I'm running:
org.apache.lucene - build 2007-02-14 on a Dual 3.4GHz Xeon w/ 2GB
memory
and Red Hat 3.4.3-22.



So, onto the question: Is this fast, slow, or normal.



Along, with the obvious follow up: if it's slow, how can I make it
faster.



Thanks for your help!



-Dave



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to