rjn wrote:
Thanks for the responses.  Yeah, so the XML file is valid, it's just
that some of the tags have HTML embedded within them.  For Example:

<entry><p>This is text.</p></entry>

So Digestor seems this as:
entry/p

Rather than just entry.  I imagine I could just downloaded the XML
documents and knowing the structure, seach for the entry fields and
then cut out the text.  Then, store that separately.  I was just
hoping there was a way to list tags to ignore.  For example: <p>,
<br>, etc.

Thanks anyway,

On 7/27/06, rjn <[EMAIL PROTECTED]> wrote:
Hi Everyone,

I'm trying to write a Syndication Feed parser using Digester, however
I'm running into a stumbling block.  Many feeds have HTML in the
entries such as <a>, <br>, etc.   Digester tries to parse these as XML
tags, thus leading to blanks in the data I pull out.  I was wondering
if there was way to set Digester to ignore specific tags (in this
case, the HTML tags)?

Thanks,
RJ

--
em: [EMAIL PROTECTED]



Or tags to just copy as text. I think that Simon had your answer with NodeCreateRule. If I'm reading correctly it will create a Document Fragment of the Node in questions and it's childern, which you could pass to an XSLT processor to serialize it into the text you want, saving the html you which to keep, or stripping the tags if you wish.

Paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to