I ran some tests and it seems that the slowness is from Lucene calls when I do "doBodySearch", if I remove that call, Lucene gives me results in 5 seconds. otherwise it takes about 50 seconds.
But I need to do Body search and that field contains lots of text. The field is <contents>. How can I optimize that ? thanks, Askar On 7/24/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Sorry, I mistyped. I don't mean the getXXXX methods, I mean the > doTagSearch, doTitleSearch, etc. > > As for the stop watch, not really sure what to make of that... Try > System.currentTimeMillis()... > > You can get just the fields you want when loading a Document by using > the FieldSelector API on IndexReader, etc. > > Perhaps, you can also use some Filters and cache them. > > Its really hard to give suggestions when it is not at all obvious > where the slowness is. Please try to isolate the Lucene calls from > the DB calls and look at the timings for both. > > On Jul 24, 2007, at 5:28 PM, Askar Zaidi wrote: > > > Thanks for the reply. > > > > I am timing the entire search process with a stop watch, a bit > > ghetto style. > > My getXXX methods are: > > > > Document doc = hits.doc(i); > > String str = doc.get("item"); > > > > So you can see that I am retrieving the entire document in a search > > query. > > Ideally , I'd like to just retrieve the Field object that I want to > > run the > > search on. I know this will give me a boost as one of my Fields is > > really > > huge. > > > > My query is selecting the entire user data-set in the database. I'd > > like to > > do some SQL based search in the query too so that I pick only those > > items > > where the phrase matches. > > > > Index contains about 650MB of data. Index file size is 14478869 bytes. > > > > thanks, > > AZ > > > > > > On 7/24/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > >> > >> Where are you getting your numbers from? That is, where are your > >> timers? Are you timing the rs.next() loop, or the individual calls > >> to Lucene? What do the getXXXXX methods look like? How big are your > >> queries? How big is your index? > >> > >> Essentially, we need more info to really help you. From what I can > >> tell, you are generating 3 different Lucene queries for each record > >> in the database. Frankly, I surprised your slowdown is only linear. > >> > >> On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote: > >> > >>> I have 512MB RAM allocated to JVM Heap. If I double my system RAM > >>> from 768MB > >>> to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker > >>> results > >>> ? > >>> > >>> Can I expect results which take 1 minute to be returned in 30 > >>> seconds with > >>> more RAM ? Should I also get a more powerful CPU ? A real server > >>> class > >>> machine ? > >>> > >>> I have also done some of the optimizations that are mentioned on > >>> the Lucene > >>> website. > >>> > >>> thanks, > >>> AZ > >>> > >>> On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > >>>> > >>>> Hey Guys, > >>>> > >>>> I just finished up using Lucene in my application. I have data in a > >>>> database , so while indexing I extract this data from the database > >>>> and pump > >>>> it into the index. Specifically , I have the following data in the > >>>> index: > >>>> > >>>> <itemID> <tags> <title> <summary> <contents> > >>>> > >>>> where itemID is just a number (primary key in the DB) > >>>> tags : text > >>>> titie: text > >>>> summary: text > >>>> contents: Huge text (text extracted from files: pdfs, docs etc). > >>>> > >>>> Now while running a search query I realized that the response time > >>>> increases in a linear fashion as the number of <itemID> increase > >>>> in the DB. > >>>> > >>>> If I have 50 items, its 8 seconds > >>>> 100 items, its 17 seconds. > >>>> 300+ items, its 60 seconds and maybe more. > >>>> > >>>> In a perfect world, I'd like to search on 300+ items within 10-15 > >>>> seconds. > >>>> Can anyone give me tips to fine tune lucene ? > >>>> > >>>> Heres a code snippet: > >>>> > >>>> sql query = "SELECT itemID from items where creator = 'askar' ; > >>>> > >>>> --execute query-- > >>>> > >>>> while(rs.next()){ > >>>> > >>>> score = doTagSearch(askar,text,itemID); > >>>> scoreTitle = doTitleSearch(askar,text,itemID); > >>>> scoreSummary = doSummarySearch(askar,text,itemID); > >>>> > >>>> ---- > >>>> > >>>> } > >>>> > >>>> So this code asks Lucene to search for the "text" in the itemID > >>>> passed. > >>>> itemID is already indexed. The while loop will run 300 times if > >>>> there are > >>>> 300 items....that gets slow...what can I do here ?? > >>>> > >>>> thanks for the replies, > >>>> > >>>> AZ > >>>> > >> > >> -------------------------- > >> Grant Ingersoll > >> Center for Natural Language Processing > >> http://www.cnlp.org/tech/lucene.asp > >> > >> Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/ > >> LuceneFAQ > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > >> For additional commands, e-mail: [EMAIL PROTECTED] > >> > >> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >