Re: Performance problem

Erik Hatcher Wed, 24 Aug 2005 06:34:02 -0700


On Aug 24, 2005, at 3:32 AM, WolfgangTäger wrote:

Dear all,
we are using Lucene to store 10Mio bilingual sentence pairs fordoing somenatural language processing with them. Each documents contains asentence,its translation and a topical code. We want to select sentencescontainingcertain words and do statistics over the topical codes in order todetect
translations which depend on the topic (like key=> Taste (topic: input
devices), key=> Schlüssel (topic: cryptography)).

While the search is carried out in a reasonably short time (about
500..800ms) we have a performance problem with actually retrieving the
documents by code like:

for (int i = nrhits-1; i >=0; i--){
        Document hitDoc = hits.doc(i);
        String code=hitDoc.get("code");
        ... statistics
}
Even when restricting nrhits to 2000, we have to wait 10..20seconds justfor the retrieval. Since the documents are so short we would haveexpecteda quicker retrieval. BtW the loop was done in inverse order in thehope to
accelerate the retrieval.

How many documents are you trying to retrieve? I think you'll havemuch better luck if you walked the documents in ascending Hits orderthan backwards, as Hits caches documents with the presumption you'llmove forward through them. I'd be curious to see how much (or if)moving forwards through Hits helps.


    Erik

Re: Performance problem

Reply via email to