To add to all these excellent suggestions: I would suggest creating a "baby index" out of the master index - pull out say 1000 docs into a test index and query. Helps in narrowing down the problem.
On Tue, Aug 4, 2009 at 8:55 AM, Matthew Hall<mh...@informatics.jax.org> wrote: > Also, how long does it take Luke to do a search against the same index. > > That way you can remove any of the timing that your application is adding > into the mix. > > If Luke doesn't take the minimum of 8 seconds... then you know its an issue > with your app. (or at least a large part of it) > > Matt > > Ian Lea wrote: >> >> Still surprising that your searches are taking so long. >> >> Have you worked through everything on >> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, suggested by >> someone earlier in this thread? Are you sure that the problem is >> really with lucene? Is it the search itself that takes a long time, or >> retrieving data for the hits? What does query.toString() look like? >> How many hits does a search typically match? Is a search on document >> id effectively instant? >> >> You have to supply more detail if you want better answers. >> >> >> -- >> Ian. >> >> >> On Tue, Aug 4, 2009 at 12:21 PM, prashant >> ullegaddi<prashullega...@gmail.com> wrote: >> >>> >>> Shahi, >>> >>> Our queries are free text queries. But they will be expanded into: >>> Multifield, Boolean. >>> We are also expanding the original query using SynExpand of lucene. A >>> simple >>> query >>> gets expanded to say a query of page size. >>> >>> And we are not storing any other fields except key (document IDs), target >>> URLs and titles. >>> >>> Prashant. >>> >>> On Tue, Aug 4, 2009 at 1:31 PM, Shashi Kant <shashi....@gmail.com> wrote: >>> >>> >>>> >>>> Prashant, I have had better luck with even larger sized indices on >>>> similar platforms. Could you elaborate what types of queries you are >>>> running, Multifield? Boolean? combinations? etc. Also you might want >>>> to remove unnecessary stored fields from the index and move them to a >>>> relational db to squeeze out better performance. >>>> >>>> >>>> Shashi >>>> >>>> >>>> On Tue, Aug 4, 2009 at 3:18 AM, prashant >>>> ullegaddi<prashullega...@gmail.com> wrote: >>>> >>>>> >>>>> I did that as well. Actually, we had 32 indexes initially. We searched >>>>> >>>> >>>> them. >>>> >>>>> >>>>> It was even horrible. >>>>> After that I merged them into 4 indexes. And did the same. No gain! >>>>> >>>>> Then, I had to merge 32 indexes into one. >>>>> >>>>> On Tue, Aug 4, 2009 at 10:48 AM, Anshum <ansh...@gmail.com> wrote: >>>>> >>>>> >>>>>> >>>>>> Hi Prashant, >>>>>> 8 seconds as the minimum time is a little too much, though considering >>>>>> you're using just 4G of RAM its still ok. >>>>>> I would advice you to break your index into smaller indexes, perhaps >>>>>> selectively query the indexes (if that's possible for your >>>>>> application) >>>>>> >>>> >>>> and >>>> >>>>>> >>>>>> use a parallelmultisearcher. Its just something that you might try and >>>>>> like. >>>>>> All said and done, parallelizing would only get you a bell-curve like >>>>>> performance graph, so you'd have to figure out the sweet spot there. >>>>>> >>>>>> -- >>>>>> Anshum Gupta >>>>>> Naukri Labs! >>>>>> http://ai-cafe.blogspot.com >>>>>> >>>>>> The facts expressed here belong to everybody, the opinions to me. The >>>>>> distinction is yours to draw............ >>>>>> >>>>>> >>>>>> On Tue, Aug 4, 2009 at 10:08 AM, prashant ullegaddi < >>>>>> prashullega...@gmail.com> wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> I'm running it on Quadcore, 2.4GHz each, 4GB RAM. >>>>>>> >>>>>>> Prashant. >>>>>>> >>>>>>> On Tue, Aug 4, 2009 at 8:38 AM, Otis Gospodnetic < >>>>>>> otis_gospodne...@yahoo.com >>>>>>> >>>>>>>> >>>>>>>> wrote: >>>>>>>> With such a large index be prepared to put it on a >>>>>>>> server with lots >>>>>>>> >>>> >>>> of >>>> >>>>>>> >>>>>>> RAM >>>>>>> >>>>>>>> >>>>>>>> (even if you follow all the tips from the Wiki). >>>>>>>> When reporting performance numbers, you really ought to tell us >>>>>>>> >>>> >>>> about >>>> >>>>>>> >>>>>>> your >>>>>>> >>>>>>>> >>>>>>>> hardware, types of queries, etc. >>>>>>>> >>>>>>>> Otis >>>>>>>> -- >>>>>>>> Sematext is hiring -- http://sematext.com/about/jobs.html?mls >>>>>>>> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ---- >>>>>>>> >>>>>>>>> >>>>>>>>> From: prashant ullegaddi <prashullega...@gmail.com> >>>>>>>>> To: java-user@lucene.apache.org >>>>>>>>> Sent: Monday, August 3, 2009 12:33:46 AM >>>>>>>>> Subject: How to improve search time? >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I've a single index of size 87GB containing around 50M documents. >>>>>>>>> >>>>>> >>>>>> When >>>>>> >>>>>>> >>>>>>> I >>>>>>> >>>>>>>>> >>>>>>>>> search for any query, >>>>>>>>> best search time I observed was 8sec. And when query is expanded >>>>>>>>> >>>> >>>> with >>>> >>>>>>>>> >>>>>>>>> synonyms, search takes >>>>>>>>> minutes (~ 2-3min). Is there a better way to search so that >>>>>>>>> >>>> >>>> overall >>>> >>>>>>>> >>>>>>>> search >>>>>>>> >>>>>>>>> >>>>>>>>> time reduces? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Prashant. >>>>>>>>> >>>>>>>> >>>>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> >>>>>>>> >>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > Matthew Hall > Software Engineer > Mouse Genome Informatics > mh...@informatics.jax.org > (207) 288-6012 > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Phone# (617) 714-4775 Cell# (617) 642-6745 Google Voice# (617) 575-9264 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org