See below. On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote:
hi, first thank you for the fast reply. I use MultiSearcher that opens 3 indexes, so this makes the whole operation surly slower, but 20seconds for 5260 results out of an 212MB index is much too slow. Another reason can of course be my ISP. Here is my code: IndexSearcher[] searchers; searchers=new IndexSearcher[3]; String path="/home/sn/public_html/"; searchers[0]=new IndexSearcher(path+"index1"); searchers[1]=new IndexSearcher(path+"index2"); searchers[2]=new IndexSearcher(path+"index3"); MultiSearcher saercher=new MultiSearcher(searchers);
Above you've opened the searcher for each search, exactly as I feared. This is a major hit. Don't do this, but keep the searchers open between calls. You can demonstrate this to yourself by returning time intervals in your HTML page. Take one timestamp right here, one after a new dummy query that you make up and hard-code, and one after the "real" query you already have below. Return them all in your HTML page and take a look. I think you'll see that the first query takes a while, and the second is very fast. And don't iterate over all the hits (more below). QueryParser parser=new QueryParser("content",new
StandardAnalyzer()); parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND); Query query=parser.parse("urlName:"+userInput+" OR "+"content:"+userInput); Hits hits=searcher.search(query); for(int i=0;i<hits.length();i++) { Document doc=hits.doc(i); }
what is the purpose of iteration above? This does nothing except waste time. I'd just remove it (unless there's something else you're doing here that you left out). If you're trying to get to the startPoint below, well, there's no reason to iterate above, just to directly to the loop below. For 5000 hits, you're repeating the search 50 times or so, as has been discussed in these archives repeatedly. See my previous mail..... // Outprint only 10 results per page
for(int i=startPoint;i<startPoint+10;i++) { Document doc=hits.doc(i); out.println(escapeHTML(doc.get("description"))+"<p>"); out.println("<a href="+doc.get("url")+">"+doc.get("url").substring(7)+"</a>"); out.println("<p><p><p>"); } Perhaps somebody see the reason why it is so slow. Thank you in advance Greetings Gaston
I'm assuming that your ISP comment is just where you're getting your page from, and that your searchers and indexes are at least on the same network and NOT separated by the web, as that would be slow and hard to fix. To get a sense of where you're really spending your time, I'd actually get the system time at various points in the process and send the *times* back in your HTML page. That'll give you a much better sense of where you're actually spending time. You can't really tell anything by measuring now long it takes to get your HTML page back, you've *got* to measure at discreet points in the code and return those. 5,000+ results should not be taking 20 seconds. I strongly suspect that the fact that you're opening your searchers every time and uselessly iterating through all the hits is the culprit. If I remember correctly, and you have 5,000 documents, you're executing the query about 50 times when you iterate through all the hits. Under the covers, Hits is optimized for about 100 results. As you iterate through, each "next 100" re-executes the query. You could search the mail archive for this topic, maybe "hits slow" or some such for greater explications. Hope this helps Erick Erick Erickson schrieb:
> Well, my index is over 1.4G, and others are reporting very large > indexes in > the 10s of gigabytes. So I suspect your index size isn't the issue. > I'd be > very, very, very surprised if it was. > > Three things spring immediately to mind. > > First, opening an IndexSearcher is a slow operation. Are you opening a > new > IndexSearcher for each query? If so, don't <G>. You can re-use the same > searcher across threads without fear and you should *definitely* keep it > open between queries. > > Second, your query could just be very, very interesting. It would be more > helpful if you posted an example of the code where you take your timings > (including opening the IndexSearcher). > > Third, if you're using a Hits object to iterate over many documents, be > aware that it re-executes the query every hundred results or so. You > want to > use one of the HitCollector/TopDocs/TopDocsCollector classes if you are > iterating over all the returned documents. And you really *don't* want > to do > an IndexReader.doc(doc#) or Searcher.doc(doc#) on every document. > > If none of this helps, please post some code fragments and I'm sure > others > will chime in. > > Best > Erick > > On 9/26/06, Gaston <[EMAIL PROTECTED]> wrote: > >> >> Hi, >> >> Lucene has itself volatile caching mechanism provided by a weak >> HashMap. Is there a possibilty to serialize the Hits Object? I think of >> a HashMap that for each found result, caches the first 100 results. Is >> it possible to implement such a feature or is there such an extension? >> My problem is that the searching of my application with an index with >> the size of 212MB takes to much time, despite I set the BooleanOperator >> from OR to AND >> >> I am happy about every suggestion. >> >> Greetings >> >> Gaston. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]