Erick,

Sorry for late reply. I was stuck with other project. content object is plain Java object. It has all fields set.

Thanks,
Tony

From: "Erick Erickson" <[EMAIL PROTECTED]>
Reply-To: java-user@lucene.apache.org
To: java-user@lucene.apache.org
Subject: Re: Index performance
Date: Thu, 12 Apr 2007 16:29:35 -0400

Inferring out on the end of a long and fragile limb..... Do you get
information from the database in any of the calls in your indexing
loop? That is, do any of...

  itr.next();
  content.getStoryText(),
  content.getStoryIdentity()
  content.getHeadline1()

go out to the DB to get info, and could that be where the time is
spent?

Erick

On 4/12/07, Tony Qian <[EMAIL PROTECTED]> wrote:

Otis,

I timed just for indexing.

thanks,
Tony

>From: Otis Gospodnetic <[EMAIL PROTECTED]>
>Reply-To: java-user@lucene.apache.org
>To: java-user@lucene.apache.org
>Subject: Re: Index performance
>Date: Thu, 12 Apr 2007 09:31:49 -0700 (PDT)
>
>Hi Tony,
>Your code looks fine to me.  I'm not sure what you timed - the whole app
>run, just indexing, indexing + optimizing...  If you times indexing +
>optimizing, leave optimization out of the timer.  How long do you think
>this should take?  Try setting maxBufferedDocs to 90.
>
>Otis
>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
>----- Original Message ----
>From: Tony Qian <[EMAIL PROTECTED]>
>To: java-user@lucene.apache.org
>Sent: Thursday, April 12, 2007 11:23:36 AM
>Subject: Index performance
>
>All,
>
>Sorry for long email. I have two questions on indexing. My data consists
of
>an id, short headline and story text. Story text has some html tags. Here
>is
>an example.
>
>In early 2005, it seemed that Shamita Shetty had finally arrived after a
>high profile debut in <i>Mohabbatein</i> [2000]. <br /><br />With 3 of
her
>films releasing in the first half of 2005, <i>Bewafa, Zeher </i>and
><i>Fareb</i>, and the first two ending up making good money, it seemed
that
>the gorgeous girl had finally started making her presence felt.
><i>Zeher</i>
>helped her being recognized as an actor and her fans had all the reasons
to
>believe that they would be seeing more of her in the coming months. <br
>/><br />Surprisingly there has been absolutely no movement ever since
then
>from Shamita's end as she hasn't had a single release in almost 2
>years now. All of this would change though with the arrival of
<i>Cash</i>
>where she is one of the leading ladies apart from Esha Deol and Dia
Mirza.
><br /><br />An action thriller popcorn entertainer, the film is directed
by
>Anubahv Sinha of <i>Dus</i> fame and stars Ajay Devgan, Suneil Shetty,
>Ritesh Deshmukh and Zayed Khan in the lead.<br /><br />
>
>I tried to index it. It took from 7-10 seconds to index about 90
documents.
>Here is my code:
>
>   static void indexContents(IndexWriter writer, List storyContentList)
>     throws IOException {
>     if (storyContentList != null && storyContentList.size() != 0) {
>         try {
>             Iterator itr = storyContentList.iterator();
>             while (itr.hasNext()){
>                 StoryContents content = (StoryContents) itr.next();
>                 Document document = new Document();
>                 document.add(new Field("storyText",
>content.getStoryText(),
>                              Field.Store.YES, Field.Index.TOKENIZED));
>                 document.add(new Field("storyIdentity",
>String.valueOf(content.getStoryIdentity()),
>                              Field.Store.YES, Field.Index.NO));
>                 document.add(new Field("headline1",
>String.valueOf(content.getHeadline1()),
>                              Field.Store.YES, Field.Index.NO));
>                 writer.addDocument(document);
>             }
>         }catch(Exception ex){
>              System.out.println(" caught a " + ex.getClass() );
>         }
>     }
>   }
>
>I opened one IndexWriter at very beginning
>IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
>true);
>
>I called optimize and closed IndexWriter after indexing documents.
>writer.optimize();
>writer.close();
>
>My question is why it took so long. Do I need to follow the instruction
of
>"How can I index HTML documents?" in FAQ from Lucene web site?
>
>Another question is if I can delete document based on storyIndentity
field
>(
>using IndexReader.deleteDocuments(term)). Since storyIdentity field is
not
>indexed, is there any performance issue or I should index it too (and
store
>it)?
>
>Appreciate your help.
>
>Tony
>
>_________________________________________________________________
>Mortgage rates near historic lows. Refinance $200,000 loan for as low as
>$771/month*
>
https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



_________________________________________________________________
Mortgage rates near historic lows. Refinance $200,000 loan for as low as $771/month* https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to