On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
    try {
      fis = new FileInputStream(f);
      HTMLParser parser = new HTMLParser(fis);

// Add the tag-stripped contents as a Reader-valued Text field so it will
// get tokenized and indexed.
// doc.add(new Field("contents", parser.getReader()));
LineNumberReader reader = new LineNumberReader(parser.getReader());
for (String l = reader.readLine(); l != null; l = reader.readLine())
// System.out.println(l);
doc.add(Field.Text("contents", l));

Notice that your loop here is adding a "contents" field for *every* line read since that is where the first semi-colon is.


Look at using Luke to explore your index. Try indexing just a dummy String:

        doc.add(Field.Text("contents", "some dummy text"));

to show that it works. Always always always simplify a complicated situation by doing the most obvious thing that _should_ work.

Also, the demo Lucene code is not really designed to be used in a production application (sadly), so you're better off borrowing code from the many articles or our book to begin with.

        Erik



// Add the summary as a field that is stored and returned with
// hit documents for display.
doc.add(new Field("summary", parser.getSummary(), Field.Store.YES, Field.Index.NO));


// Add the title as a field that it can be searched and that is stored.
doc.add(new Field("title", parser.getTitle(), Field.Store.YES, Field.Index.TOKENIZED));
}




-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 30, 2005 7:38 PM
To: java-user@lucene.apache.org
Subject: Re: HTML pages highlighter



On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:

Hi! Eric,

Erik - with a 'k' - Sorry, I let it slide once though :)

        I try to modified that with this but I get compile error. Do you have
any code snippet of highlighting code to pull the contents from the
original source?

I have a whole book full of code examples :) http://www.lucenebook.com - Grab the source code and look in src/lia/tools at Highlight*.java

 or Do you know how I can do field store?

      doc.add(new Field("contents", parser.getReader(),
Field.Store.YES, Field.Index.NO));

You cannot store it with a Reader. You need to use Field.Text(String, String), or one of the other variations.

        Erik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to