Otis,

I timed just for indexing.

thanks,
Tony

From: Otis Gospodnetic <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: Index performance
Date: Thu, 12 Apr 2007 09:31:49 -0700 (PDT)

Hi Tony,
Your code looks fine to me. I'm not sure what you timed - the whole app run, just indexing, indexing + optimizing... If you times indexing + optimizing, leave optimization out of the timer. How long do you think this should take? Try setting maxBufferedDocs to 90.

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Tony Qian <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Thursday, April 12, 2007 11:23:36 AM
Subject: Index performance

All,

Sorry for long email. I have two questions on indexing. My data consists of
an id, short headline and story text. Story text has some html tags. Here is
an example.

In early 2005, it seemed that Shamita Shetty had finally arrived after a
high profile debut in <i>Mohabbatein</i> [2000]. <br /><br />With 3 of her
films releasing in the first half of 2005, <i>Bewafa, Zeher </i>and
<i>Fareb</i>, and the first two ending up making good money, it seemed that
the gorgeous girl had finally started making her presence felt. <i>Zeher</i>
helped her being recognized as an actor and her fans had all the reasons to
believe that they would be seeing more of her in the coming months. <br
/><br />Surprisingly there has been absolutely no movement ever since then
from Shamita's end as she hasn't had a single release in almost 2
years now. All of this would change though with the arrival of <i>Cash</i>
where she is one of the leading ladies apart from Esha Deol and Dia Mirza.
<br /><br />An action thriller popcorn entertainer, the film is directed by
Anubahv Sinha of <i>Dus</i> fame and stars Ajay Devgan, Suneil Shetty,
Ritesh Deshmukh and Zayed Khan in the lead.<br /><br />

I tried to index it. It took from 7-10 seconds to index about 90 documents.
Here is my code:

  static void indexContents(IndexWriter writer, List storyContentList)
    throws IOException {
    if (storyContentList != null && storyContentList.size() != 0) {
        try {
            Iterator itr = storyContentList.iterator();
            while (itr.hasNext()){
                StoryContents content = (StoryContents) itr.next();
                Document document = new Document();
document.add(new Field("storyText", content.getStoryText(),
                             Field.Store.YES, Field.Index.TOKENIZED));
                document.add(new Field("storyIdentity",
String.valueOf(content.getStoryIdentity()),
                             Field.Store.YES, Field.Index.NO));
                document.add(new Field("headline1",
String.valueOf(content.getHeadline1()),
                             Field.Store.YES, Field.Index.NO));
                writer.addDocument(document);
            }
        }catch(Exception ex){
             System.out.println(" caught a " + ex.getClass() );
        }
    }
  }

I opened one IndexWriter at very beginning
IndexWriter writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(),
true);

I called optimize and closed IndexWriter after indexing documents.
writer.optimize();
writer.close();

My question is why it took so long. Do I need to follow the instruction of
"How can I index HTML documents?" in FAQ from Lucene web site?

Another question is if I can delete document based on storyIndentity field (
using IndexReader.deleteDocuments(term)). Since storyIdentity field is not
indexed, is there any performance issue or I should index it too (and store
it)?

Appreciate your help.

Tony

_________________________________________________________________
Mortgage rates near historic lows. Refinance $200,000 loan for as low as
$771/month*
https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John.  Enter to win today. http://msnconcertcontest.com?icid-nceltontagline


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to