Inferring out on the end of a long and fragile limb..... Do you get information from the database in any of the calls in your indexing loop? That is, do any of...
itr.next(); content.getStoryText(), content.getStoryIdentity() content.getHeadline1() go out to the DB to get info, and could that be where the time is spent? Erick On 4/12/07, Tony Qian <[EMAIL PROTECTED]> wrote:
Otis, I timed just for indexing. thanks, Tony >From: Otis Gospodnetic <[EMAIL PROTECTED]> >Reply-To: [EMAIL PROTECTED] >To: [EMAIL PROTECTED] >Subject: Re: Index performance >Date: Thu, 12 Apr 2007 09:31:49 -0700 (PDT) > >Hi Tony, >Your code looks fine to me. I'm not sure what you timed - the whole app >run, just indexing, indexing + optimizing... If you times indexing + >optimizing, leave optimization out of the timer. How long do you think >this should take? Try setting maxBufferedDocs to 90. > >Otis >. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . >Simpy -- http://www.simpy.com/ - Tag - Search - Share > >----- Original Message ---- >From: Tony Qian <[EMAIL PROTECTED]> >To: [EMAIL PROTECTED] >Sent: Thursday, April 12, 2007 11:23:36 AM >Subject: Index performance > >All, > >Sorry for long email. I have two questions on indexing. My data consists of >an id, short headline and story text. Story text has some html tags. Here >is >an example. > >In early 2005, it seemed that Shamita Shetty had finally arrived after a >high profile debut in <i>Mohabbatein</i> [2000]. <br /><br />With 3 of her >films releasing in the first half of 2005, <i>Bewafa, Zeher </i>and ><i>Fareb</i>, and the first two ending up making good money, it seemed that >the gorgeous girl had finally started making her presence felt. ><i>Zeher</i> >helped her being recognized as an actor and her fans had all the reasons to >believe that they would be seeing more of her in the coming months. <br >/><br />Surprisingly there has been absolutely no movement ever since then >from Shamita's end as she hasn't had a single release in almost 2 >years now. All of this would change though with the arrival of <i>Cash</i> >where she is one of the leading ladies apart from Esha Deol and Dia Mirza. ><br /><br />An action thriller popcorn entertainer, the film is directed by >Anubahv Sinha of <i>Dus</i> fame and stars Ajay Devgan, Suneil Shetty, >Ritesh Deshmukh and Zayed Khan in the lead.<br /><br /> > >I tried to index it. It took from 7-10 seconds to index about 90 documents. >Here is my code: > > static void indexContents(IndexWriter writer, List storyContentList) > throws IOException { > if (storyContentList != null && storyContentList.size() != 0) { > try { > Iterator itr = storyContentList.iterator(); > while (itr.hasNext()){ > StoryContents content = (StoryContents) itr.next(); > Document document = new Document(); > document.add(new Field("storyText", >content.getStoryText(), > Field.Store.YES, Field.Index.TOKENIZED)); > document.add(new Field("storyIdentity", >String.valueOf(content.getStoryIdentity()), > Field.Store.YES, Field.Index.NO)); > document.add(new Field("headline1", >String.valueOf(content.getHeadline1()), > Field.Store.YES, Field.Index.NO)); > writer.addDocument(document); > } > }catch(Exception ex){ > System.out.println(" caught a " + ex.getClass() ); > } > } > } > >I opened one IndexWriter at very beginning >IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), >true); > >I called optimize and closed IndexWriter after indexing documents. >writer.optimize(); >writer.close(); > >My question is why it took so long. Do I need to follow the instruction of >"How can I index HTML documents?" in FAQ from Lucene web site? > >Another question is if I can delete document based on storyIndentity field >( >using IndexReader.deleteDocuments(term)). Since storyIdentity field is not >indexed, is there any performance issue or I should index it too (and store >it)? > >Appreciate your help. > >Tony > >_________________________________________________________________ >Mortgage rates near historic lows. Refinance $200,000 loan for as low as >$771/month* > https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117 > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > _________________________________________________________________ MSN is giving away a trip to Vegas to see Elton John. Enter to win today. http://msnconcertcontest.com?icid-nceltontagline --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]