Re: HTML pages highlighter

Erik Hatcher Thu, 31 Mar 2005 17:04:22 -0800

On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:

    try {
      fis = new FileInputStream(f);
      HTMLParser parser = new HTMLParser(fis);
// Add the tag-stripped contents as a Reader-valued Text field so it will // get tokenized and indexed. // doc.add(new Field("contents", parser.getReader())); LineNumberReader reader = new LineNumberReader(parser.getReader()); for (String l = reader.readLine(); l != null; l = reader.readLine()) // System.out.println(l); doc.add(Field.Text("contents", l));

Notice that your loop here is adding a "contents" field for *every* line read since that is where the first semi-colon is.

Look at using Luke to explore your index. Try indexing just a dummy String:

        doc.add(Field.Text("contents", "some dummy text"));

to show that it works. Always always always simplify a complicated situation by doing the most obvious thing that _should_ work.

Also, the demo Lucene code is not really designed to be used in a production application (sadly), so you're better off borrowing code from the many articles or our book to begin with.

        Erik

// Add the summary as a field that is stored and returned with // hit documents for display. doc.add(new Field("summary", parser.getSummary(), Field.Store.YES, Field.Index.NO));

// Add the title as a field that it can be searched and that is stored. doc.add(new Field("title", parser.getTitle(), Field.Store.YES, Field.Index.TOKENIZED)); }

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 30, 2005 7:38 PM
To: [email protected]
Subject: Re: HTML pages highlighter

On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:

Hi! Eric,


Erik - with a 'k' - Sorry, I let it slide once though :)

        I try to modified that with this but I get compile error. Do you have
any code snippet of highlighting code to pull the contents from the
original source?


I have a whole book full of code examples :)
http://www.lucenebook.com - Grab the source code and look in
src/lia/tools at Highlight*.java

 or Do you know how I can do field store?

      doc.add(new Field("contents", parser.getReader(),
Field.Store.YES, Field.Index.NO));


You cannot store it with a Reader.  You need to use Field.Text(String,
String), or one of the other variations.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HTML pages highlighter

Reply via email to