Shall I setMergeFactor = 2 ? Slow indexing is not a bother.
On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > I ran some tests and it seems that the slowness is from Lucene calls when > I do "doBodySearch", if I remove that call, Lucene gives me results in 5 > seconds. otherwise it takes about 50 seconds. > > But I need to do Body search and that field contains lots of text. The > field is <contents>. How can I optimize that ? > > thanks, > Askar > > > > On 7/24/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > > > Sorry, I mistyped. I don't mean the getXXXX methods, I mean the > > doTagSearch, doTitleSearch, etc. > > > > As for the stop watch, not really sure what to make of that... Try > > System.currentTimeMillis()... > > > > You can get just the fields you want when loading a Document by using > > the FieldSelector API on IndexReader, etc. > > > > Perhaps, you can also use some Filters and cache them. > > > > Its really hard to give suggestions when it is not at all obvious > > where the slowness is. Please try to isolate the Lucene calls from > > the DB calls and look at the timings for both. > > > > On Jul 24, 2007, at 5:28 PM, Askar Zaidi wrote: > > > > > Thanks for the reply. > > > > > > I am timing the entire search process with a stop watch, a bit > > > ghetto style. > > > My getXXX methods are: > > > > > > Document doc = hits.doc(i); > > > String str = doc.get("item"); > > > > > > So you can see that I am retrieving the entire document in a search > > > query. > > > Ideally , I'd like to just retrieve the Field object that I want to > > > run the > > > search on. I know this will give me a boost as one of my Fields is > > > really > > > huge. > > > > > > My query is selecting the entire user data-set in the database. I'd > > > like to > > > do some SQL based search in the query too so that I pick only those > > > items > > > where the phrase matches. > > > > > > Index contains about 650MB of data. Index file size is 14478869 bytes. > > > > > > thanks, > > > AZ > > > > > > > > > On 7/24/07, Grant Ingersoll <[EMAIL PROTECTED] > wrote: > > >> > > >> Where are you getting your numbers from? That is, where are your > > >> timers? Are you timing the rs.next() loop, or the individual calls > > >> to Lucene? What do the getXXXXX methods look like? How big are your > > > > >> queries? How big is your index? > > >> > > >> Essentially, we need more info to really help you. From what I can > > >> tell, you are generating 3 different Lucene queries for each record > > >> in the database. Frankly, I surprised your slowdown is only linear. > > >> > > >> On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote: > > >> > > >>> I have 512MB RAM allocated to JVM Heap. If I double my system RAM > > >>> from 768MB > > >>> to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker > > >>> results > > >>> ? > > >>> > > >>> Can I expect results which take 1 minute to be returned in 30 > > >>> seconds with > > >>> more RAM ? Should I also get a more powerful CPU ? A real server > > >>> class > > >>> machine ? > > >>> > > >>> I have also done some of the optimizations that are mentioned on > > >>> the Lucene > > >>> website. > > >>> > > >>> thanks, > > >>> AZ > > >>> > > >>> On 7/24/07, Askar Zaidi <[EMAIL PROTECTED] > wrote: > > >>>> > > >>>> Hey Guys, > > >>>> > > >>>> I just finished up using Lucene in my application. I have data in a > > >>>> database , so while indexing I extract this data from the database > > >>>> and pump > > >>>> it into the index. Specifically , I have the following data in the > > >>>> index: > > >>>> > > >>>> <itemID> <tags> <title> <summary> <contents> > > >>>> > > >>>> where itemID is just a number (primary key in the DB) > > >>>> tags : text > > >>>> titie: text > > >>>> summary: text > > >>>> contents: Huge text (text extracted from files: pdfs, docs etc). > > >>>> > > >>>> Now while running a search query I realized that the response time > > >>>> increases in a linear fashion as the number of <itemID> increase > > >>>> in the DB. > > >>>> > > >>>> If I have 50 items, its 8 seconds > > >>>> 100 items, its 17 seconds. > > >>>> 300+ items, its 60 seconds and maybe more. > > >>>> > > >>>> In a perfect world, I'd like to search on 300+ items within 10-15 > > >>>> seconds. > > >>>> Can anyone give me tips to fine tune lucene ? > > >>>> > > >>>> Heres a code snippet: > > >>>> > > >>>> sql query = "SELECT itemID from items where creator = 'askar' ; > > >>>> > > >>>> --execute query-- > > >>>> > > >>>> while(rs.next()){ > > >>>> > > >>>> score = doTagSearch(askar,text,itemID); > > >>>> scoreTitle = doTitleSearch(askar,text,itemID); > > >>>> scoreSummary = doSummarySearch(askar,text,itemID); > > >>>> > > >>>> ---- > > >>>> > > >>>> } > > >>>> > > >>>> So this code asks Lucene to search for the "text" in the itemID > > >>>> passed. > > >>>> itemID is already indexed. The while loop will run 300 times if > > >>>> there are > > >>>> 300 items....that gets slow...what can I do here ?? > > >>>> > > >>>> thanks for the replies, > > >>>> > > >>>> AZ > > >>>> > > >> > > >> -------------------------- > > >> Grant Ingersoll > > >> Center for Natural Language Processing > > >> http://www.cnlp.org/tech/lucene.asp > > >> > > >> Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/ > > >> LuceneFAQ > > >> > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: [EMAIL PROTECTED] > > >> For additional commands, e-mail: [EMAIL PROTECTED] > > >> > > >> > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > >