On Mar 31, 2005, at 6:36 PM, Yagnesh Shah wrote:
try { fis = new FileInputStream(f); HTMLParser parser = new HTMLParser(fis);
// Add the tag-stripped contents as a Reader-valued Text field so it will
// get tokenized and indexed.
// doc.add(new Field("contents", parser.getReader()));
LineNumberReader reader = new LineNumberReader(parser.getReader());
for (String l = reader.readLine(); l != null; l = reader.readLine())
// System.out.println(l);
doc.add(Field.Text("contents", l));
Notice that your loop here is adding a "contents" field for *every* line read since that is where the first semi-colon is.
Look at using Luke to explore your index. Try indexing just a dummy String:
doc.add(Field.Text("contents", "some dummy text"));
to show that it works. Always always always simplify a complicated situation by doing the most obvious thing that _should_ work.
Also, the demo Lucene code is not really designed to be used in a production application (sadly), so you're better off borrowing code from the many articles or our book to begin with.
Erik
// Add the summary as a field that is stored and returned with
// hit documents for display.
doc.add(new Field("summary", parser.getSummary(), Field.Store.YES, Field.Index.NO));
// Add the title as a field that it can be searched and that is stored.
doc.add(new Field("title", parser.getTitle(), Field.Store.YES, Field.Index.TOKENIZED));
}
-----Original Message----- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 30, 2005 7:38 PM To: java-user@lucene.apache.org Subject: Re: HTML pages highlighter
On Mar 30, 2005, at 4:46 PM, Yagnesh Shah wrote:
Hi! Eric,
Erik - with a 'k' - Sorry, I let it slide once though :)
I try to modified that with this but I get compile error. Do you have any code snippet of highlighting code to pull the contents from the original source?
I have a whole book full of code examples :) http://www.lucenebook.com - Grab the source code and look in src/lia/tools at Highlight*.java
or Do you know how I can do field store?
doc.add(new Field("contents", parser.getReader(), Field.Store.YES, Field.Index.NO));
You cannot store it with a Reader. You need to use Field.Text(String, String), or one of the other variations.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]