Re: cache persistent Hits

Gaston Tue, 26 Sep 2006 14:21:14 -0700

Hi Erick,

the problem was this piece of code I don't need anymore.


for(int i=0;i<hits.length();i++)
{

         Document doc=hits.doc(i);


 }

Now it is very fast, thank you very much for your email that is writtenin detail.

Here is my application, that still is in development phase.
http://www.suchste.de

Greetings Gaston

P.S. The search for 'web' delivers over 5000 hits...


Erick Erickson schrieb:

See below.

On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote:
hi,

first thank you for the fast reply.

I use MultiSearcher that opens 3 indexes, so this makes the whole
operation surly slower, but 20seconds for 5260 results out of an 212MB
index  is  much too slow.
Another reason can of course be my ISP.

Here is my code:

        IndexSearcher[] searchers;
        searchers=new IndexSearcher[3];
        String path="/home/sn/public_html/";
        searchers[0]=new IndexSearcher(path+"index1");
        searchers[1]=new IndexSearcher(path+"index2");
        searchers[2]=new IndexSearcher(path+"index3");
        MultiSearcher saercher=new MultiSearcher(searchers);
Above you've opened the searcher for each search, exactly as I feared.This
is a major hit. Don't do this, but keep the searchers open between calls.
You can demonstrate this to yourself by returning time intervals in your
HTML page. Take one timestamp right here, one after a new dummy querythatyou make up and hard-code, and one after the "real" query you alreadyhavebelow. Return them all in your HTML page and take a look. I thinkyou'll seethat the first query takes a while, and the second is very fast. Anddon't
iterate over all the hits (more below).


       QueryParser parser=new QueryParser("content",new
StandardAnalyzer());
            parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);

            Query query=parser.parse("urlName:"+userInput+" OR
"+"content:"+userInput);

            Hits hits=searcher.search(query);

            for(int i=0;i<hits.length();i++)
            {

                Document doc=hits.doc(i);


            }
what is the purpose of iteration above? This does nothing except wastetime.I'd just remove it (unless there's something else you're doing herethat youleft out). If you're trying to get to the startPoint below, well,there's noreason to iterate above, just to directly to the loop below. For 5000hits,you're repeating the search 50 times or so, as has been discussed inthese
archives repeatedly. See my previous mail.....


      // Outprint only 10 results per page
    for(int i=startPoint;i<startPoint+10;i++)
            {

                Document doc=hits.doc(i);
out.println(escapeHTML(doc.get("description"))+"<p>");
                    out.println("<a
href="+doc.get("url")+">"+doc.get("url").substring(7)+"</a>");
                    out.println("<p><p><p>");

            }

Perhaps somebody see the reason why it is so slow.

Thank you in advance

Greetings Gaston
I'm assuming that your ISP comment is just where you're getting your page
from, and that your searchers and indexes are at least on the samenetwork
and NOT separated by the web, as that would be slow and hard to fix.
To get a sense of where you're really spending your time, I'd actuallygetthe system time at various points in the process and send the *times*back
in your HTML page. That'll give you a much better sense of where you're
actually spending time. You can't really tell anything by measuringnow long
it takes to get your HTML page back, you've *got* to measure at discreet
points in the code and return those.
5,000+ results should not be taking 20 seconds. I strongly suspectthat thefact that you're opening your searchers every time and uselesslyiteratingthrough all the hits is the culprit. If I remember correctly, and youhave5,000 documents, you're executing the query about 50 times when youiterate
through all the hits. Under the covers, Hits is optimized for about 100
results. As you iterate through, each "next 100" re-executes thequery. Youcould search the mail archive for this topic, maybe "hits slow" orsome such
for greater explications.

Hope this helps
Erick


Erick Erickson schrieb:
> Well, my index is over 1.4G, and others are reporting very large
> indexes in
> the 10s of gigabytes. So I suspect your index size isn't the issue.
> I'd be
> very, very, very surprised if it was.
>
> Three things spring immediately to mind.
>
> First, opening an IndexSearcher is a slow operation. Are you opening a
> new
> IndexSearcher for each query? If so, don't <G>. You can re-use thesame> searcher across threads without fear and you should *definitely*keep it
> open between queries.
>
> Second, your query could just be very, very interesting. It would be
more
> helpful if you posted an example of the code where you take yourtimings
> (including opening the IndexSearcher).
>
> Third, if you're using a Hits object to iterate over manydocuments, be
> aware that it re-executes the query every hundred results or so. You
> want to
> use one of the HitCollector/TopDocs/TopDocsCollector classes ifyou are
> iterating over all the returned documents. And you really *don't* want
> to do
> an IndexReader.doc(doc#) or Searcher.doc(doc#) on every document.
>
> If none of this helps, please post some code fragments and I'm sure
> others
> will chime in.
>
> Best
> Erick
>
> On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote:
>
>>
>> Hi,
>>
>> Lucene has itself  volatile caching mechanism provided by a weak
>> HashMap. Is there a possibilty to serialize the Hits Object? Ithink of>> a HashMap that for each found result, caches the first 100results. Is>> it possible to implement such a feature or is there such anextension?
>> My problem is that the searching of my application with an index with
>> the size of 212MB takes to much time, despite I set theBooleanOperator
>> from OR to AND
>>
>> I am happy about every suggestion.
>>
>> Greetings
>>
>> Gaston.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: cache persistent Hits

Reply via email to