Ok I got it fixed and though I would respond back so it was in the archive for 
the next poor soul...

Here's the code I used:
        StringBean sb = new StringBean ();
        String htmlSource = record.get("column14").toString().trim(); 
        Parser parser = new Parser(new Lexer(htmlSource));
        try {
            parser.visitAllNodesWith(sb);
        } catch (ParserException pe) {
            System.out.println("Index ParserException:" + pe.getMessage());
        }
        value = sb.getStrings ();

-----Original Message-----
From: Ross Rankin 
Sent: Thursday, July 13, 2006 4:34 PM
To: java-user
Subject: HTMLParser

Since I cannot seem to access the HTMLParser mailing list and I saw the
library recommended here, I thought someone here that has used it
successfully can help me out.  

I have HTML text stored in a database field which I want to add to a
Lucene document, but I want to remove the HTML tags, so HTMLParser
looked like it would fit the bill.

 

First, it does not seem to be parsing… hence my first problem and it
also is throwing an exception along with this phrase sprinkled around
“(No such file or directory)”.  

 

I think I may be using it wrong, so here’s what I have done.  In my
object where I create my document, I have the following code:

        StringExtractor extract = new
StringExtractor(record.get("column14").toString().trim());

        try {

            value = extract.extractStrings(false);

        } catch (ParserException pe) {

            System.out.println("Index Long Description Parser
Exception:" + pe.getMessage() );

            value = "";

        }

 

What I get out in value is like the following:

<LI><FONT size=2>Crystal Clear III and 3D combfilter for natural, sharp
images with enhanced quality </FONT>

<LI><FONT size=2>Compact and sleek design </FONT>

<LI><FONT size=2>Incredible Surround (No such file or directory)

 

So the tags are still there and oddly the ‘(No such file or directory)’
phrase is added which is not in the original text.

 

Then I get a ParserException.

 

What am I doing wrong?

 

Thanks,

Ross




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to