Otis,
I timed just for indexing.
thanks,
Tony
>From: Otis Gospodnetic <[EMAIL PROTECTED]>
>Reply-To: java-user@lucene.apache.org
>To: java-user@lucene.apache.org
>Subject: Re: Index performance
>Date: Thu, 12 Apr 2007 09:31:49 -0700 (PDT)
>
>Hi Tony,
>Your code looks fine to me. I'm not sure what you timed - the whole app
>run, just indexing, indexing + optimizing... If you times indexing +
>optimizing, leave optimization out of the timer. How long do you think
>this should take? Try setting maxBufferedDocs to 90.
>
>Otis
>. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>Simpy -- http://www.simpy.com/ - Tag - Search - Share
>
>----- Original Message ----
>From: Tony Qian <[EMAIL PROTECTED]>
>To: java-user@lucene.apache.org
>Sent: Thursday, April 12, 2007 11:23:36 AM
>Subject: Index performance
>
>All,
>
>Sorry for long email. I have two questions on indexing. My data consists
of
>an id, short headline and story text. Story text has some html tags.
Here
>is
>an example.
>
>In early 2005, it seemed that Shamita Shetty had finally arrived after a
>high profile debut in <i>Mohabbatein</i> [2000]. <br /><br />With 3 of
her
>films releasing in the first half of 2005, <i>Bewafa, Zeher </i>and
><i>Fareb</i>, and the first two ending up making good money, it seemed
that
>the gorgeous girl had finally started making her presence felt.
><i>Zeher</i>
>helped her being recognized as an actor and her fans had all the reasons
to
>believe that they would be seeing more of her in the coming months. <br
>/><br />Surprisingly there has been absolutely no movement ever since
then
>from Shamita's end as she hasn't had a single release in almost 2
>years now. All of this would change though with the arrival of
<i>Cash</i>
>where she is one of the leading ladies apart from Esha Deol and Dia
Mirza.
><br /><br />An action thriller popcorn entertainer, the film is directed
by
>Anubahv Sinha of <i>Dus</i> fame and stars Ajay Devgan, Suneil Shetty,
>Ritesh Deshmukh and Zayed Khan in the lead.<br /><br />
>
>I tried to index it. It took from 7-10 seconds to index about 90
documents.
>Here is my code:
>
> static void indexContents(IndexWriter writer, List storyContentList)
> throws IOException {
> if (storyContentList != null && storyContentList.size() != 0) {
> try {
> Iterator itr = storyContentList.iterator();
> while (itr.hasNext()){
> StoryContents content = (StoryContents) itr.next();
> Document document = new Document();
> document.add(new Field("storyText",
>content.getStoryText(),
> Field.Store.YES, Field.Index.TOKENIZED));
> document.add(new Field("storyIdentity",
>String.valueOf(content.getStoryIdentity()),
> Field.Store.YES, Field.Index.NO));
> document.add(new Field("headline1",
>String.valueOf(content.getHeadline1()),
> Field.Store.YES, Field.Index.NO));
> writer.addDocument(document);
> }
> }catch(Exception ex){
> System.out.println(" caught a " + ex.getClass() );
> }
> }
> }
>
>I opened one IndexWriter at very beginning
>IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
>true);
>
>I called optimize and closed IndexWriter after indexing documents.
>writer.optimize();
>writer.close();
>
>My question is why it took so long. Do I need to follow the instruction
of
>"How can I index HTML documents?" in FAQ from Lucene web site?
>
>Another question is if I can delete document based on storyIndentity
field
>(
>using IndexReader.deleteDocuments(term)). Since storyIdentity field is
not
>indexed, is there any performance issue or I should index it too (and
store
>it)?
>
>Appreciate your help.
>
>Tony
>
>_________________________________________________________________
>Mortgage rates near historic lows. Refinance $200,000 loan for as low as
>$771/month*
>
https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John. Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]