Re: Ignoring Specific Tags with Digester

Paul J DeCoursey Thu, 27 Jul 2006 14:30:27 -0700

Simon Kitching wrote:

On Thu, 2006-07-27 at 09:59 -0400, rjn wrote:

Hi Everyone,


I'm trying to write a Syndication Feed parser using Digester, however
I'm running into a stumbling block.  Many feeds have HTML in the
entries such as <a>, <br>, etc.   Digester tries to parse these as XML
tags, thus leading to blanks in the data I pull out.  I was wondering
if there was way to set Digester to ignore specific tags (in this
case, the HTML tags)?


No. Digester uses a standard xml parser to parse its input. That means
the input *must* be valid xml. If the input you have to handle isn't
valid xml, then you can't use an xml parser to parse it.

Perhaps you can use the NekoHTML parser to convert the input to valid
XML??
  http://java-source.net/open-source/html-parsers/nekohtml

Regards,

Simon

I don't think that was the question. I'm guessing the xml is valid, it'sjust not dealing with the xhtml part of it correctly. I'm not toofamiliar with Digester to know the solution however.


pd





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Ignoring Specific Tags with Digester

Reply via email to